编译原理课件 - 西安电子科技大学个人主页

Download Report

Transcript 编译原理课件 - 西安电子科技大学个人主页

词法分析部分总结
田
聪
1
构造词法分析器的一般方法和步骤
词词词
1. 描述:用正规式对模式进行
描述;
2. 构造NFA:为每个正规式构
造一个NFA;
3. 确定化:将NFA转换成等价
的DFA;
4. 最小化:优化DFA,使其状
态数最少;
5. 构造词法分析器:由DFA构
造词法分析器(表驱动,直
接编码,LEX)。
词词词词
词词词词
词
词
词
词
词
词
词词词词
词
词词词词词词
词
词词词词
词
词词词词词词
词词词词
2
涉及到的形式化概念
 正则表达式(正规式)
 NFA
 DFA
正则语言(正规集)
字符串,abb
A111, ….
正则表达式
NFA
DFA
例如:
Char(char|digit)*,…
3
正则语言
递归可枚举语言
上下文有关语言
上下文无关语言
正则语言
4
正则语言
 正则语言
 正则表达式
 有限状态自动机
 上下文无关语言
 上下文无关文法
 非确定的下推自动机
 上下文有关语言
 上下文有关文法
 线性有界自动机(特殊的图灵机)
 递归可枚举语言
 短语结构
 图灵机
5
正则语言相关内容
 如何证明正则表达式和NFA的等价性?
 (1)We
need to show that for every RE, there is an
automaton that accepts the same language.
 (2)And for every automaton, there is a RE defining its
language.
6
RE to ε-NFA: Basis
a
 Symbol a:
 ε:
ε
7
RE to ε-NFA: Induction 1 – Union
ε
ε
For E1
For E2
For E1
 E2
ε
ε
8
RE to ε-NFA: Induction 2 – Concatenation
For E1
ε
For E2
For E1E2
9
RE to ε-NFA: Induction 3 – Closure
ε
ε
For E
ε
ε
For E*
10
正则语言相关内容
 如何证明正则表达式和NFA的等价性?
 (1)We
need to show that for every RE, there is an
automaton that accepts the same language.
 (2)And for every automaton, there is a RE defining its
language.
11
From Automata to RE
 Arden’s rule
 For
any sets of strings S and T, the equation X=SX+T has
X=S*T as a solution. Moreover, this solution is unique if ε
not in S.
12
From Automata to RE
 Given an automaton A
 A has states {q0, …, qn} with q0 being the start state
 Let Xi denote the set of strings accepted by A starting in
state qi
 Thus, L(A)=X0
 We can write an equation for each Xi, defining it in terms of
the sets corresponding to its successor states.
13
From Automata to RE
A0
a,b,c
b, c
q0
a
c
q3
a
a,b
q2
b
q1
c
14
From Automata to RE
A0
a,b,c
b, c
q0
a
c
q3
a
q1
c
(0) X0=aX1
(1) X1=bX2+cX0+ε
(2) X2=cX0
a,b
b
q2
(0)
(1)
(2)
(3)
X0=aX1+bX3+cX3
X1=aX3+bX2+cX0+ε
X2=aX3+bX3+cX0
X3=aX3+bX3+cX3
X3=(a+b+c)X3+∅, by Arden’s
rule:
X3=(a+b+c)*∅= ∅
Substituing (0) and (2) in (1):
X1=(bc+c)aX1+ε
=((bc+c)a)* (by Arden’s rule)
X0=a((bc+c)a)*
15
DFA-to-RE
 Another approach
 Page 93, theorem 3.4. (形式语言与自动机)
 Induction on k-path.
16
Algebraic Laws for RE’s
 Union and concatenation behave sort of like addition and
multiplication.




+ is commutative (可交换的) and associative (可结合的)
 a+b=b+a, a+b+c= a+(b+c)
concatenation is associative (可结合的)
 a.b.c=a.(b.c)
Concatenation distributes over +(可分配的)
 a.(b+c)=a.b+a.c
Exception: Concatenation is not commutative (可交换的)
 a.b ≠b.a
17
Identities and Annihilators
 ∅
is the identity (单位元) for +.
R
+ ∅ = R.
 ε is the identity (单位元) for concatenation.
 εR

= Rε = R.
∅
is the annihilator (零元) for concatenation.

∅R
= R∅ = ∅.
18
Decision Properties of Regular
Languages
Cong Tian
19
Properties of Language Classes
 A language class is a set of languages.

We have one example: the regular languages.
 Language classes have two important kinds of
properties:
1.
2.
Decision properties.
Closure properties.
20
Decision Properties
 A decision property for a class of languages is an
algorithm that takes a formal description of a
language (e.g., a DFA) and tells whether or not some
property holds.
 Example: Is language L empty?
 The
representation is a DFA
 Can you tell if L(A) =  for DFA A?
21
Why Decision Properties?
 When we talked about protocols represented as
DFA’s, we noted that important properties of a
good protocol were related to the language of the
DFA.
 Example: “Does the protocol terminate?” = “Is
the language finite?”
 Example: “Can the protocol fail?” = “Is the
language nonempty?”
22
Why Decision Properties – (2)
 We might want a “smallest” representation for
a language, e.g., a minimum-state DFA or a
shortest RE.
 If you can’t decide “Are these two languages
the same?”
 I.e.,
do two DFA’s define the same language?
You can’t find a “smallest.”
23
Closure Properties
 A closure property (封闭性) of a language class says
that given languages in the class, an operator (e.g.,
union) produces another language in the same class.
 Example: the regular languages are obviously
closed under union, concatenation, and (Kleene)
closure.
ε是正规式
 若a是Σ上的字符,则a是正规式
 若r和s分别是Σ上的正规式,那么

(a) r|s是正规式

(b) rs是正规式

(c) r*是正规式

24
The Membership Question
 Our first decision property is the question: “is
string w in regular language L?(成员问题)”
 Assume L is represented by a DFA A.
 Simulate the action of A on the sequence of input
symbols forming w.
25
Example: Testing Membership
01011
Next
symbol
0
A
Start
0,1
1
B
1
C
0
Current
state
26
Example: Testing Membership
01011
Next
symbol
0
A
Start
1
0,1
B
1
C
0
Current
state
27
Example: Testing Membership
01011
Next
symbol
0
A
Start
1
0,1
B
1
C
0
Current
state
28
Example: Testing Membership
01011
Next
symbol
0
A
Start
0,1
1
B
1
C
0
Current
state
29
Example: Testing Membership
01011
Next
symbol
0
A
Start
0,1
1
B
1
C
0
Current
state
30
Example: Testing Membership
01011
Next
symbol
0
A
Start
0,1
1
B
1
C
0
Current
state
31
The Emptiness Problem
 Given a regular language, does the language
contain any string at all(判空问题).
 Assume representation is DFA.
 Construct the transition graph.
 Compute the set of states reachable from the
start state.
 If any final state is reachable, then yes, else no.
32
The Infiniteness Problem
 Is a given regular language infinite?
 Start with a DFA for the language.
 Key idea: if the DFA has n states, and the
language contains any string of length n or
more, then the language is infinite.
 Otherwise, the language is surely finite.
 Limited
to strings of length n or less.
33
Proof of Key Idea
 If an n-state DFA accepts a string w of length n or
more, then there must be a state that appears twice
on the path labeled w from the start state to a final
state.
 Because there are at least n+1 states along the path.
34
Proof – (2)
|w| = 5
s0
1
s1
2
s2
3
s3
4
s4
35
Finding Cycles
1.
2.
3.
Eliminate states not reachable from the start state.
Eliminate states that do not reach a final state.
Test if the remaining transition graph has any
cycles.
36
The Pumping Lemma
 We have, almost accidentally, proved a statement
that is quite useful for showing certain languages
are not regular.
 Called the pumping lemma for regular languages.
37
Statement of the Pumping Lemma
Number of
states of
DFA for L
For every regular language L
There is an integer n, such that
For every string w in L of length > n
We can write w = xyz such that:
1. |xy| < n.
2. |y| > 0.
Labels along
i
3. For all i > 0, xy z is in L.
first cycle on
path labeled w
38
Example: Use of Pumping Lemma
泵引理是正则语言的必要非充分条件!
一个正则语言,必须满足泵引理。
如果一个语言不满足泵引理,那么它肯定不是正则语
言。如果它满足泵引理,是它不一定就是正则语言。
 用来证明一个语言不是正则语言(必要非充分条件)
 Example: {0k1k | k > 1} is not a regular language.
 Suppose it were. Then there would be an associated n
for the pumping lemma.
 Let w = 0n1n. We can write w = xyz, where x and y
consist of 0’s, and y  ε.
 But then xyyz would be in L, and this string has more 0’s
than 1’s.
39
Decision Property: Equivalence
 Given regular languages L and M, is L = M?
 Algorithm involves constructing the product
DFA from DFA’s for L and M.
 Let these DFA’s have sets of states Q and R,
respectively.
 Product DFA has set of states Q  R.
 I.e.,
pairs [q, r] with q in Q, r in R.
40
Product DFA – Continued
 Start state = [q0, r0] (the start states of the DFA’s for
L, M).
 Transitions: δ([q,r], a) =
[δL(q,a), δM(r,a)]
 δL,
δM are the transition functions for the DFA’s of L, M.
 That is, we simulate the two DFA’s in the two state
components of the product DFA.
41
Example: Product DFA
0
0
A
1
B
[A,C]
0, 1
1
1
0
[A,D]
1
0
1
C
0
0
1
[B,C]
0
[B,D]
D
1
42
Equivalence Algorithm
 Make the final states of the product DFA be
those states [q, r] such that exactly one of q and
r is a final state of its own DFA.
 Thus, the product accepts w iff w is in exactly
one of L and M.
43
Example: Equivalence
0
0
A
1
B
[A,C]
0, 1
1
1
0
[A,D]
1
0
1
C
0
0
1
[B,C]
0
[B,D]
D
1
44
Equivalence Algorithm – (2)
 The product DFA’s language is empty iff L = M.
 But we already have an algorithm to test
whether the language of a DFA is empty.
45
Decision Property: Containment
 Given regular languages L and M, is
L  M?
 Algorithm also uses the product automaton.
 How do you define the final states [q, r] of the
product so its language is empty iff L  M?
Answer: q is final; r is not.
46
Example: Containment
0
0
A
1
B
[A,C]
0, 1
1
1
0
[A,D]
1
0
0
1
C
0
1
0
1
[B,C]
[B,D]
D
Note: the only final state
is unreachable, so
containment holds.
47
The Minimum-State DFA for a
Regular Language
 In principle, since we can test for equivalence
of DFA’s we can, given a DFA A find the DFA
with the fewest states accepting L(A).
 Test all smaller DFA’s for equivalence with A.
 But that’s a terrible algorithm.
48
Efficient State Minimization
 填表法
 不可区分的状态
r b
AB C
BB C
 尽最大努力求出可区分状态
49
Efficient State Minimization
 基础:如果p是可接收状态,q是不可接收状态,那
么{p, q}是可区分的。
 归纳:对于p, q, 如果r=(p,a)与s=(q,a)是可区分的,
那么p, q是可区分的。
50
Example: State Minimization
{1}
{2,4}
{5}
{2,4,6,8}
{1,3,5,7}
* {1,3,7,9}
* {1,3,5,7,9}
r
{2,4}
{2,4,6,8}
{2,4,6,8}
{2,4,6,8}
{2,4,6,8}
{2,4,6,8}
{2,4,6,8}
b
{5}
{1,3,5,7}
{1,3,7,9}
{1,3,5,7,9}
{1,3,5,7,9}
{5}
{1,3,5,7,9}
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
Here it is
with more
convenient
state names
Remember this DFA? It was constructed for the
chessboard NFA by the subset construction.
51
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C
D
E
X
F
X
X
X
X
B
C
D
E
G
A
F
52
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C
D
E
X
F
X
G
X
A
B
X
X
X
X
X
X
X
C
D
E
F
53
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C X
X
D
E
F X
X
X
X
X
G X
X
X
X
X
A
B
C
D
E
F
54
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C X
X
D X
E X
F X
X
X
X
X
G X
X
X
X
X
A
B
C
D
E
F
55
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C X
X
D
X
E
X
F
X
G
X
X
X
A
B
X
X
X
X
X
X
X
X
C
D
E
F
56
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C X
X
D X
X
E X
F X
X
X
X
X
X
G X
X
X
X
X
X
A
B
C
D
E
F
57
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B
C X
X
D
X
E
X
F
X
G
X
X
X
X
X
X
X
X
X
X
X
X
A
B
C
D
E
F
X
X
58
Efficient State Minimization
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
B X
C X
X
D X
X
X
E X
F X
X
X
X
X
X
X
G X
X
X
X
X
X
A
B
C
D
E
F
59
Example – Concluded
r
AB
BD
CD
DD
ED
*F D
*G D
b
C
E
F
G
G
C
G
r
AB
BH
CH
HH
b
C
H
F
G
*F H C
*G H G
B X
C X
X
D X
X
X
E X
F X
X
X
X
X
X
X
G X
X
X
X
X
X
C
D
E
F
A B
Replace D and E by H.
Result is the minimum-state DFA.
60
Eliminating Unreachable States
 Unfortunately, combining indistinguishable states
could leave us with unreachable states in the
“minimum-state” DFA.
 Thus, before or after, remove states that are not
reachable from the start state.
61
Closure Under Union
 If L and M are regular languages, so is L  M.
 Proof: Let L and M be the languages of regular
expressions R and S, respectively.
 Then R+S is a regular expression whose language is
L  M.
62
Closure Under Concatenation and
Kleene Closure
 Same idea:
 RS
is a regular expression whose language is LM.
 R* is a regular expression whose language is L*.
63
Closure Under Intersection
 If L and M are regular languages, then so is L 
M.
 Proof: Let A and B be DFA’s whose languages
are L and M, respectively.
 Construct C, the product automaton of A and B.
 Make the final states of C be the pairs consisting
of final states of both A and B.
64
Example: Product DFA for
Intersection
0
0
A
1
B
[A,C]
0, 1
1
1
0
[A,D]
1
0
1
C
0
0
1
[B,C]
0
[B,D]
D
1
65
Closure Under Difference
 If L and M are regular languages, then so is L –
M = strings in L but not M.
 Proof: Let A and B be DFA’s whose languages
are L and M, respectively.
 Construct C, the product automaton of A and B.
 Make the final states of C be the pairs where Astate is final but B-state is not.
66
Example: Product DFA for Difference
0
0
A
1
B
[A,C]
0, 1
1
1
0
[A,D]
1
0
1
C
0
0
1
[B,C]
0
[B,D]
D
1
67
Closure
Under Complementation
 The complement
of a language L (with respect to an
alphabet Σ such that Σ* contains L) is Σ* – L.
 Since Σ* is surely regular, the complement of a
regular language is always regular.
68
Closure Under Reversal
 Recall example of a DFA that accepted the
binary strings that, as integers were divisible by
23.
 We said that the language of binary strings
whose reversal was divisible by 23 was also
regular, but the DFA construction was very
tricky.
 Good application of reversal-closure.
69
Closure Under Reversal – (2)
 Given language L, LR is the set of strings whose
reversal is in L.
 Example: L = {0, 01, 100};
LR = {0, 10,
001}.
 Proof: Let E be a regular expression for L.
 We show how to reverse E, to provide a regular
expression ER for LR.
70
Reversal
 Basis: IfofEais Regular
a symbol a,Expression
ε, or ∅, then ER = E.
 Induction: If E is
then ER = FR + GR.
 FG, then ER = GRFR
 F*, then ER = (FR)*.
 F+G,
71
Example: Reversal of a RE
 Let E = 01* + 10*.
 ER = (01* + 10*)R = (01*)R + (10*)R
 = (1*)R0R + (0*)R1R
 = (1R)*0 + (0R)*1
 = 1*0 + 0*1.
72
Homomorphisms
 A homomorphism on an alphabet is a function that
gives a string for each symbol in that alphabet.
 Example: h(0) = ab; h(1) = ε.
 Extend to strings by h(a1…an) = h(a1)…h(an).
 Example: h(01010) = ababab.
73
Closure Under Homomorphism
 If L is a regular language, and h is a
homomorphism on its alphabet, then h(L) = {h(w)
| w is in L} is also a regular language.
 Proof: Let E be a regular expression for L.
 Apply h to each symbol in E.
 Language of resulting RE is h(L).
74
Example: Closure under
Homomorphism
 Let h(0) = ab; h(1) = ε.
 Let L be the language of regular expression 01* +
10*.
 Then h(L) is the language of regular expression
abε* + ε(ab)*.
Note: use parentheses
to enforce the proper
grouping.
75
Example – Continued
 abε* + ε(ab)* can be simplified.
 ε* = ε, so abε* = abε.
 ε is the identity under concatenation.
 That
is, εE = Eε = E for any RE E.
 Thus, abε* + ε(ab)* = abε + ε(ab)* = ab + (ab)*.
 Finally, L(ab) is contained in L((ab)*), so a RE for
h(L) is (ab)*.
76
Inverse Homomorphisms
 Let h be a homomorphism and L a language whose
alphabet is the output language of h.
 h-1(L) = {w | h(w) is in L}.
77
Example:
 Let h(0) Inverse
= ab; h(1) Homomorphism
= ε.
 Let L = {abab, baba}.
 h-1(L) = the language with two 0’s and any number
of 1’s = L(1*01*01*).
Notice: no string maps to
baba; any string with exactly
two 0’s maps to abab.
78
Closure Proof for Inverse
Homomorphism
 Start with a DFA A for L.
 Construct a DFA B for h-1(L) with:
 The
same set of states.
 The same start state.
 The same final states.
 Input alphabet = the symbols to which homomorphism h
applies.
79
Proof – (2)
 The transitions for B are computed by applying h to
an input symbol a and seeing where A would go on
sequence of input symbols h(a).
 Formally, δB(q, a) = δA(q, h(a)).
80
Example: Inverse Homomorphism
Construction
1
a
B
a
A
b
0
A
0
b
a
B
1
b
Since
h(1) = ε
C
h(0) = ab
h(1) = ε
Since
h(0) = ab
C
1, 0
81
Proof – (3)
 Induction on |w| shows that δB(q0, w) = δA(q0, h(w)).
 Basis: w = ε.
 δB(q0, ε) = q0, and δA(q0, h(ε)) = δA(q0, ε) = q0.
82
Proof – (4)
 Induction: Let w = xa; assume IH for x.
 δB(q0, w) = δB(δB(q0, x), a).
 = δB(δA(q0, h(x)), a) by the IH.
 = δA(δA(q0, h(x)), h(a)) by definition of the DFA B.
 = δA(q0, h(x)h(a)) by definition of the extended
delta.
 = δA(q0, h(w)) by def. of homomorphism.
83
正则语言小结
田聪
84
正则语言
 确定有限状态自动机
 非确定有限状态自动机
 带的非确定有限状态自动机
 正则表达式
RE
ε-NFA
DFA
NFA
L(RE)=L(ε-NFA)=L(NFA)=L(DFA)=正则语言
85
正则语言的性质
 泵引理(必要非充分条件)
 可用来证明一个特定的语言不是正则语言
 不能用来证明一个特定的语言是正则语言
 判定性
 一个自动机接收的语言是否为空
 串w是否可被某自动机接收
 两个自动机是否等价
86
正则语言的性质
封闭性









正则语言的并操作
交
补
差
反转
闭包
连接
同态
逆同态
87