CPSC 388 – Compiler Design and Construction
Download
Report
Transcript CPSC 388 – Compiler Design and Construction
CPSC 388 – Compiler Design
and Construction
Scanner – Regular Expressions to DFA
Announcements
ACM Programming contest
(Tues 8pm)
PROG 1 Feedback
Linux Install Fest – When?
Saturday?, Fliers, CDROMS, Bring
Laptops (do at own risk)
LUG
Understanding Editors (Eclipse, Vi,
Emacs)
Scanners
Source
Code
Lexical Analyzer
(Scanner)
Deterministic
Regular
Finite
Expression
State
Automata
Nondeterministic
Finite
State
Automata
Token
Stream
Regular Expressions
Easy way to express a language that is
accepted by FSA
Rules:
ε is a regular expression
Any symbol in Σ is a regular expression
r and s are any regular expressions then so is:
r|s denotes union e.g. “r or s”
rs denotes r followed by s (concatination)
(r)* denotes concatination of r with itself zero or
more times (Kleene closer)
() used for controlling order of operations
If
RE to NFA: Step 1
Create a tree from the Regular
Expression
*
Example
Cat
(a(a|b))*
•Leaf Nodes are either
members of Σ
or ε
•Internal Nodes are operators
cat, |, *
a
|
a
b
RE to NFA: Step 2
Do a Post-Order Traversal of Tree
(children processed before parent)
At each node follow rules for
conversion from a RE to a NFA
Leaf Nodes
Either ε or member of Σ
S
S
ε
a
*
F
F
Cat
a
|
a
b
Internal Nodes
Need to keep track of left (l)and right
(r) NFA and merge them into a single
NFA
Or
Concatination
Kleene Closure
Or Node
l
ε
S
ε
ε
r
ε
F
Concatenation Node
l
r
Kleene Closure
ε
S
ε
ε
ε
F
Try It
Convert the regular expression to a
NFA
(a|b)*abb
First convert RE to a tree
Then convert tree to NFA
NFA to DFA
Recall that a DFA can be represented
as a transition table
Characters
+
Digit
State
S
A
B
A
A
B
B
B
Operations on NFA
ε-closure(t) – Set of NFA states
reachable from NFA state t on εtransitions alone.
ε-closure(T) – Set of NFA states
reachable from some NFA state t in
set T on ε-transitions alone.
move(T,a) – Set of NFA states to
which there is a transition on input
symbol a from some state t in T
NFA to DFA Algorithm
Initially ε-closure(s) is the only state
in DFA and it is unmarked
While (there is unmarked state T in DFA)
mark T;
for (each input symbol a) {
U = ε-closure(move(T,a));
if (U not in DFA)
add U unmarked to DFA
transition[T,a]=U;
Try it
Take NFA from previous example and
construct DFA
Regular Expression: (a|b)*abb
ε
S
ε
1
ε
ε
2
4
a
3
5
b
ε
ε
ε
6
ε
7
a
8
b
9
b
F
Corresponding DFA
b
C
1,2,4,
5,6,7
b
NewS
S,1,2,4,7
a
b
a
B
1,2,3,4
6,7,8
D
1,2,4,5,
6,7,9
b
a
a
a
b
NewF
1,2,4,5,
6,7,F
Start State and Accepting States
The Start State for the DFA is
ε-closure(s)
The accepting states in the DFA are
those states that contain an accepting
state from the NFA
Efficiency of Algorithms
RE -> NFA
O(|r|) where |r| is the size of the RE
NFA -> DFA
O(|r|22|r|) – worst case
(not seen in typical programming languages)
Recognition of a string by DFA
O(|x|) where |x| is length of string
More Practice
Convert RE to NFA
((ε|a)b*)*
Convert NFA to DFA
a
ε
S
1
a
2
b
ε
3
b
4
Solution to Practice
RE to NFA
S
ε
1
ε
ε
2
4
ε
ε
3
5
a
ε
ε
ε
ε
6
ε
b
7
ε
8
ε
9
ε
F
Solution to Practice
NFA to DFA
a
A
2
a
NewS
S,1,3
b
B
4
b
Summary of Scanners
Lexemes
Tokens
Regular Expressions, Extended RE
Regular Definitions
Finite Automata (DFA & NFA)
Conversion from RE->NFA->DFA
JLex