CPSC 388 – Compiler Design and Construction

Download Report

Transcript CPSC 388 – Compiler Design and Construction

CPSC 388 – Compiler Design
and Construction
Scanner – Regular Expressions to DFA
Announcements
 ACM Programming contest
(Tues 8pm)
 PROG 1 Feedback
 Linux Install Fest – When?
Saturday?, Fliers, CDROMS, Bring
Laptops (do at own risk)
 LUG
 Understanding Editors (Eclipse, Vi,
Emacs)
Scanners
Source
Code
Lexical Analyzer
(Scanner)
Deterministic
Regular
Finite
Expression
State
Automata
Nondeterministic
Finite
State
Automata
Token
Stream
Regular Expressions
 Easy way to express a language that is
accepted by FSA
 Rules:
ε is a regular expression
Any symbol in Σ is a regular expression
r and s are any regular expressions then so is:
r|s denotes union e.g. “r or s”
rs denotes r followed by s (concatination)
(r)* denotes concatination of r with itself zero or
more times (Kleene closer)
 () used for controlling order of operations


If



RE to NFA: Step 1
 Create a tree from the Regular
Expression
*
 Example
Cat
(a(a|b))*
•Leaf Nodes are either
members of Σ
or ε
•Internal Nodes are operators
cat, |, *
a
|
a
b
RE to NFA: Step 2
 Do a Post-Order Traversal of Tree
(children processed before parent)
 At each node follow rules for
conversion from a RE to a NFA
Leaf Nodes
 Either ε or member of Σ
S
S
ε
a
*
F
F
Cat
a
|
a
b
Internal Nodes
 Need to keep track of left (l)and right
(r) NFA and merge them into a single
NFA
 Or
 Concatination
 Kleene Closure
Or Node
l
ε
S
ε
ε
r
ε
F
Concatenation Node
l
r
Kleene Closure
ε
S
ε
ε
ε
F
Try It
 Convert the regular expression to a
NFA
(a|b)*abb
 First convert RE to a tree
 Then convert tree to NFA
NFA to DFA
 Recall that a DFA can be represented
as a transition table
Characters
+
Digit
State
S
A
B
A
A
B
B
B
Operations on NFA
 ε-closure(t) – Set of NFA states
reachable from NFA state t on εtransitions alone.
 ε-closure(T) – Set of NFA states
reachable from some NFA state t in
set T on ε-transitions alone.
 move(T,a) – Set of NFA states to
which there is a transition on input
symbol a from some state t in T
NFA to DFA Algorithm
Initially ε-closure(s) is the only state
in DFA and it is unmarked
While (there is unmarked state T in DFA)
mark T;
for (each input symbol a) {
U = ε-closure(move(T,a));
if (U not in DFA)
add U unmarked to DFA
transition[T,a]=U;
Try it
 Take NFA from previous example and
construct DFA
Regular Expression: (a|b)*abb
ε
S
ε
1
ε
ε
2
4
a
3
5
b
ε
ε
ε
6
ε
7
a
8
b
9
b
F
Corresponding DFA
b
C
1,2,4,
5,6,7
b
NewS
S,1,2,4,7
a
b
a
B
1,2,3,4
6,7,8
D
1,2,4,5,
6,7,9
b
a
a
a
b
NewF
1,2,4,5,
6,7,F
Start State and Accepting States
 The Start State for the DFA is
ε-closure(s)
 The accepting states in the DFA are
those states that contain an accepting
state from the NFA
Efficiency of Algorithms
 RE -> NFA
O(|r|) where |r| is the size of the RE
 NFA -> DFA
O(|r|22|r|) – worst case
(not seen in typical programming languages)
 Recognition of a string by DFA
O(|x|) where |x| is length of string
More Practice
 Convert RE to NFA
((ε|a)b*)*
 Convert NFA to DFA
a
ε
S
1
a
2
b
ε
3
b
4
Solution to Practice
 RE to NFA
S
ε
1
ε
ε
2
4
ε
ε
3
5
a
ε
ε
ε
ε
6
ε
b
7
ε
8
ε
9
ε
F
Solution to Practice
 NFA to DFA
a
A
2
a
NewS
S,1,3
b
B
4
b
Summary of Scanners







Lexemes
Tokens
Regular Expressions, Extended RE
Regular Definitions
Finite Automata (DFA & NFA)
Conversion from RE->NFA->DFA
JLex