Conversion of Regular Expression to DFA and Use for Text Searching

Download Report

Transcript Conversion of Regular Expression to DFA and Use for Text Searching

CS-5800
Theory of Computation II
PROJECT PRESENTATION
By
Quincy Campbell
&
Sandeep Ravikanti
Text Searching
Conversion of Regular Expression to
DFA and Use for Text Searching.
Introduction
• Text Searching , an application of Finite State
automata concepts.
• A regular expression which is initially defined is
parsed to DFA (Deterministic Finite Automata).
• The obtained DFA is transformed to a transition
table.
• Thus the Scenario of Searching text is
implemented with the help transition table.
Regular Expression:
• Regular expressions consist of constants and operator symbols that denote
sets of strings and operations over these sets, which is therefore referred to
as a regular set or language.
• Let ∑ be an alphabet. The regular expression over ∑ are defined recursively
• Basis :Ø,λ and a ,for every a €∑, are regular expressions over ∑.
• Recursive Step: Let u and v be regular expressions over ∑.
(u U v)
(uv)
(u*)
These Expressions are regular expressions over ∑.
• Closure: u is a regular expression over ∑ only if it can be obtained from the
basis elements by a finite number of applications of recursive step.
Deterministic Finite Automata:
A deterministic finite automata M is a
quintuple, (Q, Σ, δ, q0, F), consisting of
• a finite set of states (Q)
• a finite set of input symbols called the alphabet (Σ)
• a transition function (δ : Q × Σ → Q)
• a start state (q0 ∈ Q)
• a set of accept states (F ⊆ Q)
Text searching using a “*” Kleen Closure:
Input String : (ab)*
Output
:Final State are “2”,”0” in transition table: Strings that can be obtained from the
given expression {λ,ab, abab, ababab……etc.….}
Kleen closure:
Input String : a*
Output
:Final State are “2”,”0” in transition table: Strings that can be obtained from the
given expression {λ,ab, abab, ababab……etc.….}
Union “(a)+”:
Input: Union Closure. Of a+”
Output: Final States of “1” with transitions from aa
Regular Expression With Parenthesis , Kleen Closure And Union Closure
Input: ((ab)*(cd)+) with String “ When hug”
Output: Final States would be “4” with No Matches for given String for searching.
Issues:
• Limitations of accessibility in handling transitions
• Using of –closures operation for the Union operation is necessary.
• An example for the expression which doesn’t work…
New Approach
Based on the limitations we had in the previous approach ,
we designed Text Searching with the Implementation of
Thompson’s algorithm. Parsing of regular expression to
NFA- Non Deterministic Finite Automata is done with the
Thompson’s algorithm .The obtained NFA is converted
DFA (Deterministic Finite Automata) with Subset
Construction Algorithm. The generated DFA is formed
into transition table and used for text Searching.
THOMPSON’S ALGORITHM
• The simplest method to convert a regular expression to a
NFA is Thompson's Construction, also known as Thompson's
Algorithm. Roughly speaking this works by reducing the
regular expression to its smallest constituent regular
expressions, converting these to NFA and then joining these
NFA together.
• Derives a nondeterministic finite automata (NFA) from any
regular expression by splitting it into its constituent sub
expressions, from which the NFA will be constructed using a
set of rules
Rules of Thompson’s Algorithm
For a regular expression of a single symbol such as “b” resultant NFA would be as follows:
For Union of regular expression “a|b:”
For Kleen Star “(a|b)*”
Final NFA:
The NFA obtained after the application of rules of Thompson’s
algorithm
SUBSET CONSTRUCTION ALGORITHM:
1. Create the start state of the DFA by taking the Є-closure of the start
state of the NFA.
2. Perform the following for the new DFA state:
For each possible input symbol:
1. Apply move to the newly-created state and the input symbol; this
will return a set of states.
2. Apply the Є -closure to this set of states, possibly resulting in a
new set.
3. This set of NFA states will be a single state in the DFA.
4. Each time we generate a new DFA state, we must apply step 2 to it.
The process is complete when applying step 2 does not yield any new
states.
5. The finish states of the DFA are those which contain any of the finish
states of the NFA.
Example: If A, B and C are states, move({A,B,C},`a') = Move (A,
‘a’) move(B, ‘a’) move(C, ‘a’).
Considering an Example to perform Subset Construction Algorithm
to generate DFA from the given NFA
Given NFA
Creating the Start State for DFA.by removing the epsilon closures
The Final DFA :
a
q1
c
c
q0
q3
c
b
a
b
q0={1,3,5,7,8,9}start state
q1={1,2,3,5,6,8,9}
q2={1,3,4,5,6,8,9}
q3={10} final state
q2
b
c
Sample Output
Conclusion:
We finally conclude that the searching of
text is implemented with achievement transition
table of DFA for the given regular expression.
References :
Sudkamp A Thomas., Languages and Machines .Introduction
to Theory of Computer Science
Hop craft, J.E., and Ullman[1979],Introduction to automata
theory ,Languages and Computation, Addison-Wesley, Reading ,MA
https://class.coursera.org/automata/lecture/preview
http://en.wikipedia.org/wiki/Thompson's_construction_algo
rithm
Thank you