無投影片標題

Download Report

Transcript 無投影片標題

CHAPTER 5 Compiler
5.1 Basic Compiler Concepts
Source program
Lexical analysis
Token
Syntax analysis
Parse tree
Table management
Intermediate code generation
Intermediate code
Error handling
Code optimalization
Code generation
Machine code
編譯器執行的功能
Intermediate code
Basic Compiler Concepts
1. Lexical Analysis (Lexical Analyzer 或Scanner)
Read the source program one character at a
time, carving the some program into a sequence
of atomic units called token.
Token (token type, token value)
Basic Compiler Concepts
PROGRAM MAIN;
VARIABLE INTEGER:U,V,M;
U = 5;
V = 7;
CALL S1(U ,V , M );
ENP;
SUBPOUTINE S1( INTEGER : X , Y , M ) ;
M = X + Y + 2.7;
ENS;
FRANCIS 語言所寫之程式
Basic Compiler Concepts
PROGRAM MAIN;
(2,21)
(5,3) (1,1)
VARIABLE INTEGER:
U
,
V
,
M ;
(2,25)
(2,14)
(1,12) (5,1) (1,11) (5,5) (1,11) (5,6) (1,1)
U
(5,1)
=
(1,4)
V
(5,5)
=
(1,4)
5
(3,1)
7
(3,2)
;
(1,1)
;
(1,1)
CALL S1 (
U
,
V
,
M
)
;
(2,3) (5,10) (1,2) (5,1) (1,11) (5,5) (1,11) (5,6) (1,3) (1,1)
ENP
(2,6)
;
(1,1)
SUBPOUTINE S1
(
INTEGER :
X
,
Y
,
M
)
;
(2,23)
(5,10) (1,2) (2,14)
(1,12) (5,8) (1,11) (5,4) (1,11) (5,9) (1,3) (1,1)
M
= X
+
Y
+
2.7 ;
(5,9) (1,4) (5,8) (1,5) (5,4) (1,5) (4,1) (1,1)
ENS
;
(2,7)
(1,1)
FRANCIS 語言所寫之程式,被轉換成記號的格式
Basic Compiler Concepts
2. Syntax Analysis (Syntax Analyzer 或Parser)
The grammar specified the form, or syntax, of legal
statements in the language.
<id-list> ::= id | <id-list>,id
<assign> ::= id:=<exp>
<exp>
<term>
::= <term> | <exp>+<term> | <exp>-<term>
::= <factor> | <term>*<factor> | <term> DIV<factor>
<factor> ::= id | int | (<exp>)
<read>
<write>
::= READ(<id-list>)
::= WRITE(<id-list>)
PASCAL 語言之部份文法
Basic Compiler Concepts
<id-list>
<assign>
<read>
<write>
::= id | <id-list>,id
::= id:=<exp>
::= READ(<id-list>)
::= WRITE(<id-list>)
PASCAL 語言之部份文法
<read>
<id-list>
READ
(
id
)
VALUE
READ (VALUE)敘述之語法樹
Parse Tree
Basic Compiler Concepts
<assign>
<exp>
<term>
<factor>
::= id:=<exp>
::= <term> | <exp>+<term> | <exp>-<term>
::= <factor> | <term>*<factor> | <term> DIV<factor>
::= id | int | (<exp>)
PASCAL 語言之部份文法
<assign>
<exp>
<exp>
<term>
<term>
<term>
<term>
<factor>
id
VARIANCE
:=
id
SUMSQ
<factor>
DIV
int
100
<factor>
-
id
MEAN
<factor>
*
id
MEAN
VARIANCE:= SUMSQ DIV 100 - MEAN * MEAN 敘述之語法樹
Basic Compiler Concepts
<assign>
<exp>
<term>
<factor>
::= id:=<exp>
::= <term> | <exp>+<term> | <exp>-<term>
::= <factor> | <term>*<factor> | <term> DIV<factor>
::= id | int | (<exp>)
PASCAL 語言之部份文法
Syntax Error
<term>
<factor>
<factor>
id
+/
id
A
B
A + / B 敘述之語法樹
Basic Compiler Concepts
3. Intermediate Code Generation
Three Address Code
(operator,operand1,operand2,Result)
A=B+C (+,B,C,A)
SUM:=A/B*C,可以被分解成
T1=A/B
(/,A,B,T1)
T2=T1*C
(*,T1,C,T2)
SUM=T2
(=,T2, ,SUM)
Basic Compiler Concepts
SUM:=A/B*C,可以被分解成
T1=A/B
(/,A,B,T1)
T2=T1*C
(*,T1,C,T2)
SUM=T2
(=,T2, ,SUM)
<assign>
<exp>
<exp>
<term>
<term>
<factor>
id
SUM
:=
<term>
<factor>
id
DIV
id
*
A
B
敘述 SUM:=A/B*C 之語法樹
<factor>
id
C
Basic Compiler Concepts
4. Code Optimization
Improve the intermediate code (or machine code),
so that the ultimate object program run fast
and/or takes less space
FOR
begin
I:= 1 To 10 Do
A:=10;
B[I+1]:= C[I+1]+A;
end
未最佳化
A:=10;
FOR
I:= 1 To 10 Do
begin
J:== I + 1;
B[J]:= C[J]+A;
end
最佳化後
Basic Compiler Concepts
5. Code Generation
* Allocate memory location
* Select machine code for each intermediate code
* Register allocation: utilize registers as
efficiently as possible
(+,B,C,A) 我們可以得到
MOV AX,B
ADD AX,C
MOV A,AX
Basic Compiler Concepts
SUM:=A/B*C
(/,A,B,T1)
MOV
AX,A
DIV
B
MOV
T1,AX
(* ,T1,C,T2) MOV
AX,T1
MUL
C
MOV
T2,AX
(=,T2, ,SUM) MOV
AX,T2
MOV
SUM,AX
Basic Compiler Concepts
(/,A,B,T1)
MOV
DIV
MOV
(* ,T1,C,T2) MOV
MUL
MOV
(=,T2, ,SUM) MOV
MOV
再作一次碼的最佳化
AX,A
B
T1,AX
AX,T1
C
T2,AX
AX,T2
SUM,AX
Basic Compiler Concepts
6. Table Management and Error Handling
Token, symbol table, reserved word table, delimiter
table, constant table,… etc.
* 五大功能之每一功能均做一次處理,如此就是五次處
理。
* 也可以把幾個功能合併在同一次處理。
* 它至少是二次處理。
Grammar
5.2 Grammar
1. Grammar
Backus Naur Form Grammar consists of a set of
rules, each which defines the syntax of some
construct in the programming language.
<id-list> ::= id | <id-list>,id
<assign> ::= id:=<exp>
<exp>
<term>
Terminal symbol
Non-terminal symbol
::= <term> | <exp>+<term> | <exp>-<term>
::= <factor> | <term>*<factor> | <term> DIV<factor>
<factor> ::= id | int | (<exp>)
<read>
<write>
::= READ(<id-list>)
::= WRITE(<id-list>)
PASCAL 語言之部份文法
Grammar
2. Parse Tree (Syntax Tree)
It is often convenient to display the analysis of source
statement in terms of a grammar as a tree.
<read>
<id-list>
READ
(
id
)
VALUE
READ (VALUE)敘述之語法樹
Grammar
3. Precedence and associativity
Precedence
*, / > +, -
Associativity
a+b+c
Left associativity
Right associativity
( (a + b) + c)
Grammar
4. Ambiguous Grammar
There is more than one possible parse tree for
a given statement.
<start>
<start>
<term>
<term>
<term>
<term>
id
<term>
+
id
<term>
<term>
-
id
<term>
id
<term>
+
id
<term>
-
id
Grammar
Ambiguous Grammar
<start>
<term>
::= <term>
::= id | <term>+<term>
| <term>-<term>
<start>
<start>
<term>
<term>
<term>
<term>
id
<term>
+
id
<term>
<term>
-
id
<term>
id
<term>
+
id
<term>
-
id
Lexical Analysis
5.3 Lexical Analysis
Program內有下列幾類Token:
a. Identifier
b. Delimiter
c. Reserved Word
d. Constant integer, float, string
1. Identifier
<ident> ::= <letter> | <ident> <letter> | <ident>
<digit>
<letter>::= A | B | C | …..
<digit>::= 0 | 1 | 2 |…..
Multiple character token
Lexical Analysis
2. Token and Tables
1
;
2
(
3
)
4
=
5
+
6
7
*
8
/

9
10
‘
11
’
12
:
Table 1 Delimiters
Lexical Analysis
2. Token and Tables
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
AND
BOOLEAN
CALL
DIMENSION
ELSE
ENP
ENS
EQ
GE
GT
GTO
IF
INPUT
INTEGER
LABEL
LE
LT
NE
OR
OUTPUT
PROGRAM
REAL
SUBROUTINE
THEN
VARIABLE
Table 2 (Reserved Word Table)
Lexical Analysis
2. Token and Tables
1
5
2
7
Table 3 (Integer Table)
1
2.7
Table 4 (Real Number Table)
Lexical Analysis
2. Token and Tables
1
2
3
4
5
6
7
8
9
10
Identifier
U
MAIN
Y
V
M
X
M
S1
Subroutine
3
Type
10
3
3
10
10
Table 5 (Identifier Table)
Pointer
Lexical Analysis
2. Token and Tables
PROGRAM MAIN;
(2,21)
(5,3) (1,1)
Token Specifier
(Token Type, Token Value)
Table
Entry
VARIABLE INTEGER:
U
,
V
,
M ;
(2,25)
(2,14)
(1,12) (5,1) (1,11) (5,5) (1,11) (5,6) (1,1)
U
(5,1)
=
(1,4)
V
(5,5)
=
(1,4)
5
(3,1)
7
(3,2)
;
(1,1)
;
(1,1)
CALL S1 (
U
,
V
,
M
)
;
(2,3) (5,10) (1,2) (5,1) (1,11) (5,5) (1,11) (5,6) (1,3) (1,1)
ENP
(2,6)
;
(1,1)
SUBPOUTINE S1
(
INTEGER :
X
,
Y
,
M
)
;
(2,23)
(5,10) (1,2) (2,14)
(1,12) (5,8) (1,11) (5,4) (1,11) (5,9) (1,3) (1,1)
M
= X
+
Y
+
2.7 ;
(5,9) (1,4) (5,8) (1,5) (5,4) (1,5) (4,1) (1,1)
ENS
;
(2,7)
(1,1)
FRANCIS 語言所寫之程式,被轉換成記號的格式
Syntax Analysis
5.4 Syntax Analysis
1. Building the Parse Tree
a. Top down method
Begin with the rule of the grammar, and attempt
to construct the tree so that the terminal
nodes match the statements being analyzed.
b. Bottom up method
Begin with the terminal nodes of the tree, and
attempt to combine these into successively high
level nodes until the root is reached.
Syntax Analysis
* Top down method
Begin with the rule of the grammar, and attempt
to construct the tree so that the terminal
nodes match the statements being analyzed.
<start>
<term>
<term>
<term>
id
+
id
-
id
Syntax Analysis
* Bottom up method
Begin with the terminal nodes of the tree, and
attempt to combine these into successively high
level nodes until the root is reached.
<term>
<term>
id
<term>
+
id
<term>
-
id
Syntax Analysis
2. Operator Precedence Parser
Bottom up parser
READ ; := +
READ
;
:=
+
(
)
id
<
>
>
>
>
>
<
>
>
<
>
>
-
(
=
) id
<
< <
<
> < > <
> < > <
< < = <
>
>
>
>
Precedence Matrix
Syntax Analysis
READ ; := +
READ
;
:=
+
(
)
id
Stack
<
<READ
<READ = (
<READ = ( <id
<READ = ( <id>
<READ = ( = id-list )
<READ = ( = id-list ) >
read
<
>
>
>
>
>
<
>
>
<
>
>
-
(
=
) id
<
< <
<
> < > <
> < > <
< < = <
>
>
>
>
input
READ(id);
(id)
id)
)
READ
)
<read>
<id-list>
(
id
)
VALUE
READ (VALUE)敘述之語法樹
Syntax Analysis
Stack
<
<id
<id>
<term +
<term + < id >
<term + term >
<term - <
<term - <id>
<term - term>
term
READ ; := +
READ
;
:=
+
(
)
id
input
id + id - id
+ id - id
+ id - id
id - id
- id
- id
id
<term>
id
<
>
>
>
>
-
) id
<
< <
<
> < > <
> < > <
< < = <
>
>
>
>
<
>
>
<
>
>
>
(
=
<start>
<term>
<term>
<term>
+
id
<term>
-
id
Syntax Analysis
<start>
<term>
::= <term>
::= id | <term>+<term>
Stack
<
<id
<id>
<term +
<term + < id >
<term + term >
<term - <
<term - <id>
<term - term>
term
| <term>-<term>
input
id + id - id
+ id - id
+ id - id
id - id
- id
- id
id
Generally use a stack to save tokens that have
been scanned but not yet parsed
Syntax Analysis
3. Recursive Descent Parser
Top down method
a. leftmost derivation
It must be possible to decide which alternative to
used by examining the next input token
<stmt>
id, READ, WRITE
<stmt>
::= <assign> | <read> | <write>
<id-list> ::= id | <id-list>,id
<assign> ::= id:=<exp>
<exp>
::= <term> | <exp>+<term>
| <exp>-<term>
<term>
::= <factor> | <term>*<factor> | <term> DIV<factor>
<read>
<write>
::= READ(<id-list>)
::= WRITE(<id-list>)
PASCAL 語言之部份文法
Syntax Analysis
b. left recursive
Top down parser can not be used with grammar that
contains left recursive. Because unable to decide
between its alternatives tokens.
both id and <id-list> can begin with id.
<id-list> ::= id | <id-list>,id
<assign> ::= id:=<exp>
<exp>
<term>
::= <term> | <exp>+<term> | <exp>-<term>
::= <factor> | <term>*<factor> | <term> DIV<factor>
<factor> ::= id | int | (<exp>)
<read>
<write>
::= READ(<id-list>)
::= WRITE(<id-list>)
PASCAL 語言之部份文法
Syntax Analysis
Modified for recursive descent parser
<id-list> ::= id {, id}
<assign> ::= id:=<exp>
<exp>
<term>
::= <term> { +<term> | -<term> }
::= <factor> { *<factor> | DIV<factor> }
<factor> ::= id | int | (<exp>)
<read>
<write>
::= READ(<id-list>)
::= WRITE(<id-list>)
PASCAL 語言之部份文法
Code Generation
5.5 Code Generation
When the parser recognizes a portion of the source
program according to some rule of grammar, the
corresponding routine is executed.
Semantic Routine
or Code Generation Routines
1.Operator precedence parser
When sub-string is reduced to nonterminal
2.Recursive descent parser
When procedure return to its caller, indicating
success.
Code Generation
<start>
<term>
::= <term>
::= id | <term>+<term>
| <term>-<term>
<term> ::= <term>1 + <term>2
MOV AX, <term>1
ADD AX, <term>2
MOV <term>, AX
<start>
<term>
<term> ::= <term>1 - <term>2
MOV AX, <term>1
<term>
SUB AX, <term>2
MOV <term>, AX
id
<term> ::= id
add id to <term>
<term>
<term>
+
id
<term>
-
id
Code Generation
直接產生Assembly instructions或Machine codes太細
故先翻成Intermediate Form
Intermediate Form
5.6 Intermediate Form
Three Address Code (Quadruple Form)
(operator,operand1 , operand2 , Result)
<term> ::= <term>1 + <term>2
(+, <term>1, <term>2, <term>)
<term> ::= <term>1 - <term>2
(-, <term>1, <term>2, <term>)
<term> ::= id
add id to <term>
Intermediate Form
Variance := sumsq DIV 100 - mean * mean
(DIV, sumsq,
#100,
i1)
(*,
mean,
mean,
i2)
(-,
i1,
i2,
i3)
(:=,
i3,
,
variance)
Machine Independent Compiler Features
5.7 Machine Independent Compiler Features
1. Storage Allocation
a. Storage Allocation
* Static Allocation
Allocate at compiler time
* Dynamic Allocation
Allocate at run time
Auto : Function call
STACK
Controlled : malloc( ), free( )
HEAP
Machine Independent Compiler Features
2. Activation Record
Each function call creates an activation record that
contains storage for all the variables used by the function,
return address,… etc.
Variables
Return Address
Next
Previous
Variables
Return Address
Next
Previous
Stack
Machine Independent Compiler Features
Activation Record
To OS
MAIN
Call SUB
MAIN Variables
MAIN
Return Address
Next
Previous
Stack
Machine Independent Compiler Features
Activation Record
To OS
SUB Variables
MAIN
SUB
Return Address
Next
Previous
Call SUB
MAIN Variables
MAIN
Return Address
Next
Previous
Stack
SUB
Call SUB
Machine Independent Compiler Features
Activation Record
SUB
Return Address
To OS
SUB Variables
MAIN
SUB
Return Address
Next
Previous
Call SUB
MAIN Variables
MAIN
Return Address
Next
Previous
Stack
SUB
Call SUB
Machine Independent Compiler Features
3. Prologue and Epilogue
The compiler must generate additional code to
manage the activation records themselves.
a. Prologue
The code to create a new activation record
b. Epilogue
The code to delete the current activation record
Machine Independent Compiler Features
4. Structure Variables
Array, Record, String, Set …..
B:array[0..3,0..1] of integer
B[0][0]
B[0][1]
B[1][0]
B[1][1]
B[2][0]
B[2][1]
B[3][0]
B[3][1]
B[0][0]
B[0][1]
B[1][0]
B[1][1] B[2][0]
此陣列為列優先
B[2][1]
B[3][0]
B[3][1]
B[0][0]
B[1][0]
B[2][0]
B[3][0] B[0][1]
此陣列為行優先
B[1][1]
B[2][1]
B[3][1]
Machine Independent Compiler Features
Type
B[a-b] [c-d]
Address of B[s][t]
Row Major
[(s - a) *(d - c +1) + (t - c) ] * sizeof(Type) + Base address
Column Major
[(t - c) *(b - a +1) + (s - a) ] * sizeof(Type) + Base address
B[0][0]
B[0][1]
B[1][0]
B[1][1] B[2][0]
此陣列為列優先
B[2][1]
B[3][0]
B[3][1]
B[0][0]
B[1][0]
B[2][0]
B[3][0] B[0][1]
此陣列為行優先
B[1][1]
B[2][1]
B[3][1]
Machine Independent Compiler Features
5. Code Optimization
For I:= 1 to 10
Begin
x[I, 2*J-1] := T[I, 2*J];
Table[I] := 2**I;
END
a. Common Sub-expression
b. Loop In-variants
c. Reduction in Strength
T1:= 2 *J;
T2 := T1 - 1;
K := 1;
For I:= 1 to 10
Begin
x[I, T2] := T[I, T1];
K := K * 2;
Table[I] := K;
END
Compiler Design Option
5.8 Compiler Design Option
1. Interpreter
An interpreter processes a source program written
in a high level language, just as a compiler does.
The main difference is that interpreters execute a
version of the source directly.
An interpreter can be viewed as a set of functions,
the execution of these functions is driven by the
internal form of the program.
Compiler Design Option
2. P Code Compiler
* P Code 就是Byte Code,是一種與機器無關
(Machine Independent)的語言
* 可以跨平台在不同種類的電腦內執行。
Source
Program
Byte
Code
Java
Interpreter
Byte
Code
Java
Run Module
Run
Compiler Design Option
3. Compiler-Compiler
A software tool that can be used to help in
the task of compiler construction.
Uses Finite State Automata
YACC
LEX
Parser Generator
Scanner Generator
Unix
Compiler Design Option
4. Cross Compiler
Program
Source
Cross
Compiler
工作站
80XX Machine
個人電腦
Code
80XX Machine
Code
Run