编译原理课件 - 西安电子科技大学个人主页
Download
Report
Transcript 编译原理课件 - 西安电子科技大学个人主页
词法分析部分总结
田
聪
1
构造词法分析器的一般方法和步骤
词词词
1. 描述:用正规式对模式进行
描述;
2. 构造NFA:为每个正规式构
造一个NFA;
3. 确定化:将NFA转换成等价
的DFA;
4. 最小化:优化DFA,使其状
态数最少;
5. 构造词法分析器:由DFA构
造词法分析器(表驱动,直
接编码,LEX)。
词词词词
词词词词
词
词
词
词
词
词
词词词词
词
词词词词词词
词
词词词词
词
词词词词词词
词词词词
2
涉及到的形式化概念
正则表达式(正规式)
NFA
DFA
正则语言(正规集)
字符串,abb
A111, ….
正则表达式
NFA
DFA
例如:
Char(char|digit)*,…
3
正则语言
递归可枚举语言
上下文有关语言
上下文无关语言
正则语言
4
正则语言
正则语言
正则表达式
有限状态自动机
上下文无关语言
上下文无关文法
非确定的下推自动机
上下文有关语言
上下文有关文法
线性有界自动机(特殊的图灵机)
递归可枚举语言
短语结构
图灵机
5
正则语言相关内容
如何证明正则表达式和NFA的等价性?
(1)We
need to show that for every RE, there is an
automaton that accepts the same language.
(2)And for every automaton, there is a RE defining its
language.
6
RE to ε-NFA: Basis
a
Symbol a:
ε:
ε
7
RE to ε-NFA: Induction 1 – Union
ε
ε
For E1
For E2
For E1
E2
ε
ε
8
RE to ε-NFA: Induction 2 – Concatenation
For E1
ε
For E2
For E1E2
9
RE to ε-NFA: Induction 3 – Closure
ε
ε
For E
ε
ε
For E*
10
正则语言相关内容
如何证明正则表达式和NFA的等价性?
(1)We
need to show that for every RE, there is an
automaton that accepts the same language.
(2)And for every automaton, there is a RE defining its
language.
11
From Automata to RE
Arden’s rule
For
any sets of strings S and T, the equation X=SX+T has
X=S*T as a solution. Moreover, this solution is unique if ε
not in S.
12
From Automata to RE
Given an automaton A
A has states {q0, …, qn} with q0 being the start state
Let Xi denote the set of strings accepted by A starting in
state qi
Thus, L(A)=X0
We can write an equation for each Xi, defining it in terms of
the sets corresponding to its successor states.
13
From Automata to RE
A0
a,b,c
b, c
q0
a
c
q3
a
a,b
q2
b
q1
c
14
From Automata to RE
A0
a,b,c
b, c
q0
a
c
q3
a
q1
c
(0) X0=aX1
(1) X1=bX2+cX0+ε
(2) X2=cX0
a,b
b
q2
(0)
(1)
(2)
(3)
X0=aX1+bX3+cX3
X1=aX3+bX2+cX0+ε
X2=aX3+bX3+cX0
X3=aX3+bX3+cX3
X3=(a+b+c)X3+∅, by Arden’s
rule:
X3=(a+b+c)*∅= ∅
Substituing (0) and (2) in (1):
X1=(bc+c)aX1+ε
=((bc+c)a)* (by Arden’s rule)
X0=a((bc+c)a)*
15
DFA-to-RE
Another approach
Page 93, theorem 3.4. (形式语言与自动机)
Induction on k-path.
16
Algebraic Laws for RE’s
Union and concatenation behave sort of like addition and
multiplication.
+ is commutative (可交换的) and associative (可结合的)
a+b=b+a, a+b+c= a+(b+c)
concatenation is associative (可结合的)
a.b.c=a.(b.c)
Concatenation distributes over +(可分配的)
a.(b+c)=a.b+a.c
Exception: Concatenation is not commutative (可交换的)
a.b ≠b.a
17
Identities and Annihilators
∅
is the identity (单位元) for +.
R
+ ∅ = R.
ε is the identity (单位元) for concatenation.
εR
= Rε = R.
∅
is the annihilator (零元) for concatenation.
∅R
= R∅ = ∅.
18
Decision Properties of Regular
Languages
Cong Tian
19
Properties of Language Classes
A language class is a set of languages.
We have one example: the regular languages.
Language classes have two important kinds of
properties:
1.
2.
Decision properties.
Closure properties.
20
Decision Properties
A decision property for a class of languages is an
algorithm that takes a formal description of a
language (e.g., a DFA) and tells whether or not some
property holds.
Example: Is language L empty?
The
representation is a DFA
Can you tell if L(A) = for DFA A?
21
Why Decision Properties?
When we talked about protocols represented as
DFA’s, we noted that important properties of a
good protocol were related to the language of the
DFA.
Example: “Does the protocol terminate?” = “Is
the language finite?”
Example: “Can the protocol fail?” = “Is the
language nonempty?”
22
Why Decision Properties – (2)
We might want a “smallest” representation for
a language, e.g., a minimum-state DFA or a
shortest RE.
If you can’t decide “Are these two languages
the same?”
I.e.,
do two DFA’s define the same language?
You can’t find a “smallest.”
23
Closure Properties
A closure property (封闭性) of a language class says
that given languages in the class, an operator (e.g.,
union) produces another language in the same class.
Example: the regular languages are obviously
closed under union, concatenation, and (Kleene)
closure.
ε是正规式
若a是Σ上的字符,则a是正规式
若r和s分别是Σ上的正规式,那么
(a) r|s是正规式
(b) rs是正规式
(c) r*是正规式
24
The Membership Question
Our first decision property is the question: “is
string w in regular language L?(成员问题)”
Assume L is represented by a DFA A.
Simulate the action of A on the sequence of input
symbols forming w.
25
Example: Testing Membership
01011
Next
symbol
0
A
Start
0,1
1
B
1
C
0
Current
state
26
Example: Testing Membership
01011
Next
symbol
0
A
Start
1
0,1
B
1
C
0
Current
state
27
Example: Testing Membership
01011
Next
symbol
0
A
Start
1
0,1
B
1
C
0
Current
state
28
Example: Testing Membership
01011
Next
symbol
0
A
Start
0,1
1
B
1
C
0
Current
state
29
Example: Testing Membership
01011
Next
symbol
0
A
Start
0,1
1
B
1
C
0
Current
state
30
Example: Testing Membership
01011
Next
symbol
0
A
Start
0,1
1
B
1
C
0
Current
state
31
The Emptiness Problem
Given a regular language, does the language
contain any string at all(判空问题).
Assume representation is DFA.
Construct the transition graph.
Compute the set of states reachable from the
start state.
If any final state is reachable, then yes, else no.
32
The Infiniteness Problem
Is a given regular language infinite?
Start with a DFA for the language.
Key idea: if the DFA has n states, and the
language contains any string of length n or
more, then the language is infinite.
Otherwise, the language is surely finite.
Limited
to strings of length n or less.
33
Proof of Key Idea
If an n-state DFA accepts a string w of length n or
more, then there must be a state that appears twice
on the path labeled w from the start state to a final
state.
Because there are at least n+1 states along the path.
34
Proof – (2)
|w| = 5
s0
1
s1
2
s2
3
s3
4
s4
35
Finding Cycles
1.
2.
3.
Eliminate states not reachable from the start state.
Eliminate states that do not reach a final state.
Test if the remaining transition graph has any
cycles.
36
The Pumping Lemma
We have, almost accidentally, proved a statement
that is quite useful for showing certain languages
are not regular.
Called the pumping lemma for regular languages.
37
Statement of the Pumping Lemma
Number of
states of
DFA for L
For every regular language L
There is an integer n, such that
For every string w in L of length > n
We can write w = xyz such that:
1. |xy| < n.
2. |y| > 0.
Labels along
i
3. For all i > 0, xy z is in L.
first cycle on
path labeled w
38
Example: Use of Pumping Lemma
泵引理是正则语言的必要非充分条件!
一个正则语言,必须满足泵引理。
如果一个语言不满足泵引理,那么它肯定不是正则语
言。如果它满足泵引理,是它不一定就是正则语言。
用来证明一个语言不是正则语言(必要非充分条件)
Example: {0k1k | k > 1} is not a regular language.
Suppose it were. Then there would be an associated n
for the pumping lemma.
Let w = 0n1n. We can write w = xyz, where x and y
consist of 0’s, and y ε.
But then xyyz would be in L, and this string has more 0’s
than 1’s.
39
Decision Property: Equivalence
Given regular languages L and M, is L = M?
Algorithm involves constructing the product
DFA from DFA’s for L and M.
Let these DFA’s have sets of states Q and R,
respectively.
Product DFA has set of states Q R.
I.e.,
pairs [q, r] with q in Q, r in R.
40
Product DFA – Continued
Start state = [q0, r0] (the start states of the DFA’s for
L, M).
Transitions: δ([q,r], a) =
[δL(q,a), δM(r,a)]
δL,
δM are the transition functions for the DFA’s of L, M.
That is, we simulate the two DFA’s in the two state
components of the product DFA.
41
Example: Product DFA
0
0
A
1
B
[A,C]
0, 1
1
1
0
[A,D]
1
0
1
C
0
0
1
[B,C]
0
[B,D]
D
1
42
Equivalence Algorithm
Make the final states of the product DFA be
those states [q, r] such that exactly one of q and
r is a final state of its own DFA.
Thus, the product accepts w iff w is in exactly
one of L and M.
43
Example: Equivalence
0
0
A
1
B
[A,C]
0, 1
1
1
0
[A,D]
1
0
1
C
0
0
1
[B,C]
0
[B,D]
D
1
44
Equivalence Algorithm – (2)
The product DFA’s language is empty iff L = M.
But we already have an algorithm to test
whether the language of a DFA is empty.
45
Decision Property: Containment
Given regular languages L and M, is
L M?
Algorithm also uses the product automaton.
How do you define the final states [q, r] of the
product so its language is empty iff L M?
Answer: q is final; r is not.
46
Example: Containment
0
0
A
1
B
[A,C]
0, 1
1
1
0
[A,D]
1
0
0
1
C
0
1
0
1
[B,C]
[B,D]
D
Note: the only final state
is unreachable, so
containment holds.
47
The Minimum-State DFA for a
Regular Language
In principle, since we can test for equivalence
of DFA’s we can, given a DFA A find the DFA
with the fewest states accepting L(A).
Test all smaller DFA’s for equivalence with A.
But that’s a terrible algorithm.
48
Efficient State Minimization
填表法
不可区分的状态
r b
AB C
BB C
尽最大努力求出可区分状态
49
Efficient State Minimization
基础:如果p是可接收状态,q是不可接收状态,那
么{p, q}是可区分的。
归纳:对于p, q, 如果r=(p,a)与s=(q,a)是可区分的,
那么p, q是可区分的。
50
Example: State Minimization
{1}
{2,4}
{5}
{2,4,6,8}
{1,3,5,7}
* {1,3,7,9}
* {1,3,5,7,9}
r
{2,4}
{2,4,6,8}
{2,4,6,8}
{2,4,6,8}
{2,4,6,8}
{2,4,6,8}
{2,4,6,8}
b
{5}
{1,3,5,7}
{1,3,7,9}
{1,3,5,7,9}
{1,3,5,7,9}
{5}
{1,3,5,7,9}
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
Here it is
with more
convenient
state names
Remember this DFA? It was constructed for the
chessboard NFA by the subset construction.
51
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C
D
E
X
F
X
X
X
X
B
C
D
E
G
A
F
52
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C
D
E
X
F
X
G
X
A
B
X
X
X
X
X
X
X
C
D
E
F
53
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C X
X
D
E
F X
X
X
X
X
G X
X
X
X
X
A
B
C
D
E
F
54
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C X
X
D X
E X
F X
X
X
X
X
G X
X
X
X
X
A
B
C
D
E
F
55
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C X
X
D
X
E
X
F
X
G
X
X
X
A
B
X
X
X
X
X
X
X
X
C
D
E
F
56
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C X
X
D X
X
E X
F X
X
X
X
X
X
G X
X
X
X
X
X
A
B
C
D
E
F
57
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C X
X
D
X
E
X
F
X
G
X
X
X
X
X
X
X
X
X
X
X
X
A
B
C
D
E
F
X
X
58
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B X
C X
X
D X
X
X
E X
F X
X
X
X
X
X
X
G X
X
X
X
X
X
A
B
C
D
E
F
59
Example – Concluded
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
r
AB
BH
CH
HH
b
C
H
F
G
*F H C
*G H G
B X
C X
X
D X
X
X
E X
F X
X
X
X
X
X
X
G X
X
X
X
X
X
C
D
E
F
A B
Replace D and E by H.
Result is the minimum-state DFA.
60
Eliminating Unreachable States
Unfortunately, combining indistinguishable states
could leave us with unreachable states in the
“minimum-state” DFA.
Thus, before or after, remove states that are not
reachable from the start state.
61
Closure Under Union
If L and M are regular languages, so is L M.
Proof: Let L and M be the languages of regular
expressions R and S, respectively.
Then R+S is a regular expression whose language is
L M.
62
Closure Under Concatenation and
Kleene Closure
Same idea:
RS
is a regular expression whose language is LM.
R* is a regular expression whose language is L*.
63
Closure Under Intersection
If L and M are regular languages, then so is L
M.
Proof: Let A and B be DFA’s whose languages
are L and M, respectively.
Construct C, the product automaton of A and B.
Make the final states of C be the pairs consisting
of final states of both A and B.
64
Example: Product DFA for
Intersection
0
0
A
1
B
[A,C]
0, 1
1
1
0
[A,D]
1
0
1
C
0
0
1
[B,C]
0
[B,D]
D
1
65
Closure Under Difference
If L and M are regular languages, then so is L –
M = strings in L but not M.
Proof: Let A and B be DFA’s whose languages
are L and M, respectively.
Construct C, the product automaton of A and B.
Make the final states of C be the pairs where Astate is final but B-state is not.
66
Example: Product DFA for Difference
0
0
A
1
B
[A,C]
0, 1
1
1
0
[A,D]
1
0
1
C
0
0
1
[B,C]
0
[B,D]
D
1
67
Closure
Under Complementation
The complement
of a language L (with respect to an
alphabet Σ such that Σ* contains L) is Σ* – L.
Since Σ* is surely regular, the complement of a
regular language is always regular.
68
Closure Under Reversal
Recall example of a DFA that accepted the
binary strings that, as integers were divisible by
23.
We said that the language of binary strings
whose reversal was divisible by 23 was also
regular, but the DFA construction was very
tricky.
Good application of reversal-closure.
69
Closure Under Reversal – (2)
Given language L, LR is the set of strings whose
reversal is in L.
Example: L = {0, 01, 100};
LR = {0, 10,
001}.
Proof: Let E be a regular expression for L.
We show how to reverse E, to provide a regular
expression ER for LR.
70
Reversal
Basis: IfofEais Regular
a symbol a,Expression
ε, or ∅, then ER = E.
Induction: If E is
then ER = FR + GR.
FG, then ER = GRFR
F*, then ER = (FR)*.
F+G,
71
Example: Reversal of a RE
Let E = 01* + 10*.
ER = (01* + 10*)R = (01*)R + (10*)R
= (1*)R0R + (0*)R1R
= (1R)*0 + (0R)*1
= 1*0 + 0*1.
72
Homomorphisms
A homomorphism on an alphabet is a function that
gives a string for each symbol in that alphabet.
Example: h(0) = ab; h(1) = ε.
Extend to strings by h(a1…an) = h(a1)…h(an).
Example: h(01010) = ababab.
73
Closure Under Homomorphism
If L is a regular language, and h is a
homomorphism on its alphabet, then h(L) = {h(w)
| w is in L} is also a regular language.
Proof: Let E be a regular expression for L.
Apply h to each symbol in E.
Language of resulting RE is h(L).
74
Example: Closure under
Homomorphism
Let h(0) = ab; h(1) = ε.
Let L be the language of regular expression 01* +
10*.
Then h(L) is the language of regular expression
abε* + ε(ab)*.
Note: use parentheses
to enforce the proper
grouping.
75
Example – Continued
abε* + ε(ab)* can be simplified.
ε* = ε, so abε* = abε.
ε is the identity under concatenation.
That
is, εE = Eε = E for any RE E.
Thus, abε* + ε(ab)* = abε + ε(ab)* = ab + (ab)*.
Finally, L(ab) is contained in L((ab)*), so a RE for
h(L) is (ab)*.
76
Inverse Homomorphisms
Let h be a homomorphism and L a language whose
alphabet is the output language of h.
h-1(L) = {w | h(w) is in L}.
77
Example:
Let h(0) Inverse
= ab; h(1) Homomorphism
= ε.
Let L = {abab, baba}.
h-1(L) = the language with two 0’s and any number
of 1’s = L(1*01*01*).
Notice: no string maps to
baba; any string with exactly
two 0’s maps to abab.
78
Closure Proof for Inverse
Homomorphism
Start with a DFA A for L.
Construct a DFA B for h-1(L) with:
The
same set of states.
The same start state.
The same final states.
Input alphabet = the symbols to which homomorphism h
applies.
79
Proof – (2)
The transitions for B are computed by applying h to
an input symbol a and seeing where A would go on
sequence of input symbols h(a).
Formally, δB(q, a) = δA(q, h(a)).
80
Example: Inverse Homomorphism
Construction
1
a
B
a
A
b
0
A
0
b
a
B
1
b
Since
h(1) = ε
C
h(0) = ab
h(1) = ε
Since
h(0) = ab
C
1, 0
81
Proof – (3)
Induction on |w| shows that δB(q0, w) = δA(q0, h(w)).
Basis: w = ε.
δB(q0, ε) = q0, and δA(q0, h(ε)) = δA(q0, ε) = q0.
82
Proof – (4)
Induction: Let w = xa; assume IH for x.
δB(q0, w) = δB(δB(q0, x), a).
= δB(δA(q0, h(x)), a) by the IH.
= δA(δA(q0, h(x)), h(a)) by definition of the DFA B.
= δA(q0, h(x)h(a)) by definition of the extended
delta.
= δA(q0, h(w)) by def. of homomorphism.
83
正则语言小结
田聪
84
正则语言
确定有限状态自动机
非确定有限状态自动机
带的非确定有限状态自动机
正则表达式
RE
ε-NFA
DFA
NFA
L(RE)=L(ε-NFA)=L(NFA)=L(DFA)=正则语言
85
正则语言的性质
泵引理(必要非充分条件)
可用来证明一个特定的语言不是正则语言
不能用来证明一个特定的语言是正则语言
判定性
一个自动机接收的语言是否为空
串w是否可被某自动机接收
两个自动机是否等价
86
正则语言的性质
封闭性
正则语言的并操作
交
补
差
反转
闭包
连接
同态
逆同态
87