编译原理课件 - 西安电子科技大学个人主页

Download Report

Transcript 编译原理课件 - 西安电子科技大学个人主页

词法分析部分总结
田
聪
1
构造词法分析器的一般方法和步骤
词词词
1. 描述:用正规式对模式进行
描述;
2. 构造NFA:为每个正规式构
造一个NFA;
3. 确定化:将NFA转换成等价
的DFA;
4. 最小化:优化DFA,使其状
态数最少;
5. 构造词法分析器:由DFA构
造词法分析器(表驱动,直
接编码,LEX)。
词词词词
词词词词
词
词
词
词
词
词
词词词词
词
词词词词词词
词
词词词词
词
词词词词词词
词词词词
2
涉及到的形式化概念
 正则表达式(正规式)
 NFA
 DFA
正则语言(正规集)
字符串,abb
A111, ….
正则表达式
NFA
DFA
例如:
Char(char|digit)*,…
3
正则语言
递归可枚举语言
上下文有关语言
上下文无关语言
正则语言
4
正则语言
 正则语言
 正则表达式
 有限状态自动机
 上下文无关语言
 上下文无关文法
 非确定的下推自动机
 上下文有关语言
 上下文有关文法
 线性有界自动机(特殊的图灵机)
 递归可枚举语言
 短语结构
 图灵机
5
Algebraic Laws for RE’s
 Union and concatenation behave like addition and
multiplication.




+ is commutative (可交换的) and associative (可结合的)
 a+b=b+a, a+b+c= a+(b+c)
concatenation is associative (可结合的)
 a.b.c=a.(b.c)
Concatenation distributes over +(可分配的)
 a.(b+c)=a.b+a.c
Exception: Concatenation is not commutative (可交换的)
 a.b ≠b.a
6
Identities and Annihilators
 ∅
is the identity (单位元) for +.
R
+ ∅ = R.
 ε is the identity (单位元) for concatenation.
 εR

= Rε = R.
∅
is the annihilator (零元) for concatenation.

∅R
= R∅ = ∅.
7
正则语言相关内容
 如何证明正则表达式和NFA的等价性?
 (1)We
need to show that for every RE, there is an
automaton that accepts the same language.
 (2)And for every automaton, there is a RE defining its
language.
8
RE to ε-NFA: Basis
a
 Symbol a:
 ε:
ε
9
RE to ε-NFA: Induction 1 – Union
ε
ε
For E1
For E2
For E1
 E2
ε
ε
10
RE to ε-NFA: Induction 2 – Concatenation
For E1
ε
For E2
For E1E2
11
RE to ε-NFA: Induction 3 – Closure
ε
ε
For E
ε
ε
For E*
12
正则语言相关内容
 如何证明正则表达式和NFA的等价性?
 (1)We
need to show that for every RE, there is an
automaton that accepts the same language.
 (2)And for every automaton, there is a RE defining its
language.
13
From Automata to RE
 Arden’s rule
 For
any sets of strings S and T, the equation X=SX+T has
X=S*T as a solution. Moreover, this solution is unique if ε
not in S.
14
From Automata to RE

Given an automaton A
 A has states {q0, …, qn} with q0 being the start
state
 Let Xi denote the set of strings accepted by A
starting in state qi
 Thus, L(A)=X0
 We can write an equation for each Xi, defining it in
terms of the sets corresponding to its successor
states.
15
From Automata to RE
A0
a,b,c
b, c
q0
a
c
q3
a
a,b
q2
b
q1
c
16
From Automata to RE
A0
a,b,c
b, c
q0
a
c
q3
a
q1
c
(0) X0=aX1
(1) X1=bX2+cX0+ε
(2) X2=cX0
a,b
b
q2
(0)
(1)
(2)
(3)
X0=aX1+bX3+cX3
X1=aX3+bX2+cX0+ε
X2=aX3+bX3+cX0
X3=aX3+bX3+cX3
X3=(a+b+c)X3+∅, by Arden’s
rule:
X3=(a+b+c)*∅= ∅
Substituing (0) and (2) in (1):
X1=(bc+c)aX1+ε
=((bc+c)a)* (by Arden’s rule)
X0=a((bc+c)a)*
17
DFA-to-RE
 Another approach
 Page 93, theorem 3.4. (形式语言与自动机)
 Induction on k-path.
18
Decision Properties of Regular
Languages
19
Properties of Language Classes
 A language class is a set of languages.


We have one example for language class: the regular
languages.
任何一个正则表达式都表达了一个语言,所有的
正则表达式构成了语言类:正则语言
 Language classes have two important kinds of
properties:
1.
2.
Decision properties.
Closure properties.
20
Decision Properties
A decision property for a class of languages
is an algorithm that takes a formal
description of a language (e.g., a DFA) and
tells whether or not some property holds.
Example: Is language L empty?
Suppose the representation is a DFA.
Can you tell if L(A) =  for DFA A?
21
Why Decision Properties?
 When we talked about protocols represented as
DFA’s, we noted that important properties of a
good protocol were related to the language of the
DFA.
 Example: “Does the protocol terminate?” = “Is
the language finite?”
 Example: “Can the protocol fail?” = “Is the
language nonempty?”
22
Why Decision Properties – (2)
 We might want a “smallest” representation for
a language, e.g., a minimum-state DFA or a
shortest RE.
 If you can’t decide “Are these two languages
the same?”
 I.e.,
do two DFA’s define the same language?
You can’t find the “smallest.”
23
Closure Properties
 A closure property (封闭性) of a language class says
that given languages in the class, an operator (e.g.,
union) produces another language in the same class.
 Example: the regular languages are obviously
closed under union, concatenation, and (Kleene)
closure. (求补?求交?)
ε是正规式
 若a是Σ上的字符,则a是正规式
 若r和s分别是Σ上的正规式,那么

(a) r|s是正规式

(b) rs是正规式

(c) r*是正规式

24
The Membership Question
 Our first decision property is the question: “is
string w in regular language L?(成员问题)”
 Assume L is represented by a DFA A.
 Simulate the action of A on the sequence of input
symbols forming w.
25
Example: Testing Membership
01011
Next
symbol
0
A
Start
0,1
1
B
1
C
0
Current
state
26
Example: Testing Membership
01011
Next
symbol
0
A
Start
1
0,1
B
1
C
0
Current
state
27
Example: Testing Membership
01011
Next
symbol
0
A
Start
1
0,1
B
1
C
0
Current
state
28
Example: Testing Membership
01011
Next
symbol
0
A
Start
0,1
1
B
1
C
0
Current
state
29
Example: Testing Membership
01011
Next
symbol
0
A
Start
0,1
1
B
1
C
0
Current
state
30
Example: Testing Membership
01011
Next
symbol
0
A
Start
0,1
1
B
1
C
0
Current
state
31
The Emptiness Problem
 Given a regular language, does the language
contain no string at all(判空问题).
 Assume representation is DFA.
 Construct the transition graph.
 Compute the set of states reachable from the
start state.
 If any final state is reachable, then yes, else no.
32
The Infiniteness Problem
 Is a given regular language infinite?
 Start with a DFA for the language.
 Key idea: if the DFA has n states, and the
language contains any string of length n or
more, then the language is infinite.
 Otherwise, the language is surely finite.
 Limited
to strings of length n or less.
33
Proof of Key Idea
 If an n-state DFA accepts a string w of length n or
more, then there must be a state that appears twice
on the path labeled w from the start state to a final
state.
 Because there are at least n+1 states along the path.
34
Proof – (2)
|w| = 5
s0
1
s1
2
s2
3
s3
4
s4
35
Finding Cycles
1.
2.
3.
Eliminate states not reachable from the start state.
Eliminate states that do not reach a final state.
Test if the remaining transition graph has any
cycles.
36
The Pumping Lemma
 We have, almost accidentally, proved a statement
that is quite useful for showing certain languages
are not regular.
 Called the pumping lemma for regular languages.
37
Statement of the Pumping Lemma
Number of
states of
DFA for L
For every regular language L
There is an integer n, such that
For every string w in L of length > n
We can write w = xyz such that:
1. |xy| < n.
2. |y| > 0.
Labels along
3. For all i > 0, xyiz is in L. first cycle on
path labeled w
38
Example: Use of Pumping Lemma
泵引理是正则语言的必要非充分条件!
一个正则语言,必须满足泵引理。
如果一个语言不满足泵引理,那么它肯定不是正则语
言。如果它满足泵引理,它不一定是正则语言。
 用来证明一个语言不是正则语言(必要非充分条件)
 Example: {0k1k | k > 1} is not a regular language.
 Suppose it were. Then there would be an associated n
for the pumping lemma.
 Let w = 0n1n. We can write w = xyz, where x and y
consist of 0’s, and y  ε.
 But then xyyz would be in L, and this string has more 0’s
than 1’s.
39
Pumping Lemma
满足泵引理,但不是正则语言
Jeffrey Jaffe (MIT) 的泵引理(正则语言的必要充分条件)
40
Decision Property: Equivalence
 Given regular languages L and M, is L = M?
 Algorithm involves constructing the product
DFA from DFA’s for L and M.
 Let these DFA’s have sets of states Q and R,
respectively.
 Product DFA has set of states Q  R.
 I.e.,
pairs [q, r] with q in Q, r in R.
41
Product DFA – Continued
 Start state = [q0, r0] (the start states of the DFA’s for
L, M).
 Transitions: δ([q,r], a) =
[δL(q,a), δM(r,a)]
 δL,
δM are the transition functions for the DFA’s of L, M.
 That is, we simulate the two DFA’s in the two state
components of the product DFA.
42
Example: Product DFA
0
0
A
1
B
[A,C]
0, 1
1
1
0
[A,D]
1
0
1
C
0
0
1
[B,C]
0
[B,D]
D
1
43
Equivalence Algorithm
 Make the final states of the product DFA be
those states [q, r] such that exactly one of q and
r is a final state of its own DFA.
 若等价,一个接收,另一个也接收!
 Thus, the product accepts w iff w is in exactly
one of L and M.
44
Example: Equivalence
0
0
A
1
B
[A,C]
0, 1
1
1
0
[A,D]
1
0
1
C
0
0
1
[B,C]
0
[B,D]
D
1
45
Equivalence Algorithm – (2)
 The product DFA’s language is empty iff L = M.
 We already have an algorithm to test whether
the language of a DFA is empty.
46
Decision Property: Containment
 Given regular languages L and M, is L  M?
 Algorithm also uses the product automaton.
 How do you define the final states [q, r] of the
product so its language is empty iff L  M?
Answer: q is final; r is not.
47
Example: Containment
0
0
A
1
B
[A,C]
0, 1
1
1
0
[A,D]
1
0
0
1
C
0
1
0
1
[B,C]
[B,D]
D
Note: the only final state
is unreachable, so
containment holds.
48
The Minimum-State DFA for a
Regular Language
 In principle, since we can test for equivalence
of DFA’s we can, given a DFA A find the DFA
with the fewest states accepting L(A).
 Test all smaller DFA’s for equivalence with A.
 But that’s a terrible algorithm.
49
Efficient State Minimization
 填表法
 不可区分的状态
r b
AB C
BB C
 尽最大努力求出可区分状态
50
Efficient State Minimization
 基础:如果p是可接收状态,q是不可接收状态,那
么{p, q}是可区分的。
 归纳:对于p, q, 如果r=(p,a)与s=(q,a)是可区分的,
那么p, q是可区分的。
51
Example: State Minimization
{1}
{2,4}
{5}
{2,4,6,8}
{1,3,5,7}
* {1,3,7,9}
* {1,3,5,7,9}
r
{2,4}
{2,4,6,8}
{2,4,6,8}
{2,4,6,8}
{2,4,6,8}
{2,4,6,8}
{2,4,6,8}
b
{5}
{1,3,5,7}
{1,3,7,9}
{1,3,5,7,9}
{1,3,5,7,9}
{5}
{1,3,5,7,9}
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
Here it is
with more
convenient
state names
Remember this DFA? It was constructed for the
chessboard NFA by the subset construction.
52
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C
D
E
X
F
X
X
X
X
B
C
D
E
G
A
F
53
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C
D
E
X
F
X
G
X
A
B
X
X
X
X
X
X
X
C
D
E
F
54
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C X
X
D
E
F X
X
X
X
X
G X
X
X
X
X
A
B
C
D
E
F
55
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C X
X
D X
E X
F X
X
X
X
X
G X
X
X
X
X
A
B
C
D
E
F
56
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C X
X
D
X
E
X
F
X
G
X
X
X
A
B
X
X
X
X
X
X
X
X
C
D
E
F
57
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C X
X
D X
X
E X
F X
X
X
X
X
X
G X
X
X
X
X
X
A
B
C
D
E
F
58
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C X
X
D
X
E
X
F
X
G
X
X
X
X
X
X
X
X
X
X
X
X
A
B
C
D
E
F
X
X
59
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B X
C X
X
D X
X
X
E X
F X
X
X
X
X
X
X
G X
X
X
X
X
X
A
B
C
D
E
F
60
Example – Concluded
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
r
AB
BH
CH
HH
b
C
H
F
G
*F H C
*G H G
B X
C X
X
D X
X
X
E X
F X
X
X
X
X
X
X
G X
X
X
X
X
X
C
D
E
F
A B
Replace D and E by H.
Result is the minimum-state DFA.
61
Eliminating Unreachable States
 Unfortunately, combining indistinguishable states
could leave us with unreachable states in the
“minimum-state” DFA.
 Thus, before or after, remove states that are not
reachable from the start state.
62
Closure Under Union
 If L and M are regular languages, so is L  M.
 Proof: Let L and M be the languages of regular
expressions R and S, respectively.
 Then R+S is a regular expression whose language is
L  M.
63
Closure Under Concatenation and
Kleene Closure
 Same idea:
 RS
is a regular expression whose language is LM.
 R* is a regular expression whose language is L*.
64
Closure Under Intersection
 If L and M are regular languages, then so is L 
M.
 Proof: Let A and B be DFA’s whose languages
are L and M, respectively.
 Construct C, the product automaton of A and B.
 Make the final states of C be the pairs consisting
of final states of both A and B.
65
Example: Product DFA for
Intersection
0
0
A
1
B
[A,C]
0, 1
1
1
0
[A,D]
1
0
1
C
0
0
1
[B,C]
0
[B,D]
D
1
66
Closure Under Difference
 If L and M are regular languages, then so is L –
M = strings in L but not M.
 Proof: Let A and B be DFA’s whose languages
are L and M, respectively.
 Construct C, the product automaton of A and B.
 Make the final states of C be the pairs where Astate is final but B-state is not.
67
Example: Product DFA for Difference
0
0
A
1
B
[A,C]
0, 1
1
1
0
[A,D]
1
0
1
C
0
0
1
[B,C]
0
[B,D]
D
1
68
Closure Under Complementation
 The complement of a language L (with respect to an
alphabet Σ such that Σ* contains L) is Σ* – L.
 Since Σ* is surely regular, the complement of a
regular language is always regular.
69
Closure Under Reversal – (2)
 Given language L, LR is the set of strings whose
reversal is in L.
 Example: L = {0, 01, 100};
LR = {0, 10,
001}.
 Proof: Let E be a regular expression for L.
 We show how to reverse E, to provide a regular
expression ER for LR.
70
Reversal of a Regular Expression
 Basis: If E is a symbol a, ε, or ∅, then ER = E.
 Induction: If E is
then ER = FR + GR.
 FG, then ER = GRFR
 F*, then ER = (FR)*.
 F+G,
71
Example: Reversal of a RE
 Let E = 01* + 10*.
 ER = (01* + 10*)R = (01*)R + (10*)R
 = (1*)R0R + (0*)R1R
 = (1R)*0 + (0R)*1
 = 1*0 + 0*1.
72
Homomorphisms
 A homomorphism on an alphabet is a function that
gives a string for each symbol in that alphabet.
 Example: h(0) = ab; h(1) = ε.
 Extend to strings by h(a1…an) = h(a1)…h(an).
 Example: h(01010) = ababab.
73
Closure Under Homomorphism
 If L is a regular language, and h is a
homomorphism on its alphabet, then h(L) = {h(w)
| w is in L} is also a regular language.
 Proof: Let E be a regular expression for L.
 Apply h to each symbol in E.
 Language of resulting RE is h(L).
74
Example: Closure under
Homomorphism
 Let h(0) = ab; h(1) = ε.
 Let L be the language of regular expression 01* +
10*.
 Then h(L) is the language of regular expression
abε* + ε(ab)*.
75
Example – Continued
 abε* + ε(ab)* can be simplified.
 ε* = ε, so abε* = abε.
 ε is the identity under concatenation.
 That
is, εE = Eε = E for any RE E.
 Thus, abε* + ε(ab)* = abε + ε(ab)* = ab + (ab)*.
 Finally, L(ab) is contained in L((ab)*), so a RE for
h(L) is (ab)*.
76
正则语言小结
田聪
77
正则语言
 确定有限状态自动机
 非确定有限状态自动机
 带的非确定有限状态自动机
 正则表达式
RE
ε-NFA
DFA
NFA
L(RE)=L(ε-NFA)=L(NFA)=L(DFA)=正则语言
78
正则语言的性质
 泵引理(必要非充分条件)
 可用来证明一个特定的语言不是正则语言
 不能用来证明一个特定的语言是正则语言
 判定性
 一个自动机接收的语言是否为空
 串w是否可被某自动机接收
 两个自动机是否等价
79
正则语言的性质
封闭性








正则语言的并操作
交
补
差
反转
闭包
连接
同态
80