Transcript Document

Ştefan Stăncescu
PART I
SISTEM UTILITIES
Lecture 6
Compilers
1
COMPILERS
“high level language” HLL,
w/complex grammar laws,
closer to human language
HLL mean for man  computer link
human language  binary language
HLL  binary language
COMPILER - Automatic translation machine
2
COMPILERS
Source Code =>in HLL language
Object code =>in binary language (machine code)
COMPILATION – cf. HLL grammar law
• lexical laws
language elements type and structure
• syntactic laws
composition rules of language elements
• "semantic" laws (translation programs)
syntactic law correspondent in object code,
“semantic programs” for machine
3
COMPILERS
Compiling = review + translate HLL source text
• lexical laws
scanner
• syntactic laws
parser
• "semantic" laws
object code generator
(at the VM – intermediate code - "bytecode“)
4
COMPILERS
SCANER identifies tokens
• language elements one or many adjacent single characters
separated by characters sp, LF,FF, etc.)
• words START, STOP, LABEL01
• operators +*/• special signs(){}//.,
5
COMPILERS
SCANER step I
scanning HLL source text
determine the token list by boundary
identify HLL tokens
identify programmer invented tokens
create look-up table with
numerical symbols for tokens
6
COMPILERS
SCANNER step 2
create intermediate source file
with replaced tokens with
numerical symbols from the
look-up table created in step 1
7
COMPILERS
BNF – Bachus-Naur Form
syntactic rule REPRESENTATION
A rule - law in BNF format 
 a valid construction in HLL language
 formatted template of
a rule applied in a line in source file
(and a rule applied for lines in a line list)
8
COMPILERS
Syntactic rule  valid construction in HLL
A template have the name of
the new built and checked element
that can be part of other construction
(including one with the same pattern)
New build name  “nonterminal” symbol
BNF rule form:
<nonterminal symbol > :: = building template
9
COMPILERS
Parsing  discovery in HLL source file of
successive valid BNF rules (templates) until
there are no more undiscovered laws
(no more “nonterminal” symbols)
Parsing ends only on tokens (“terminal” symbols)
Chaining BNF rules (templates) => syntax tree
The purpose parsing => the discovery of
the syntax tree of the source file
10
COMPILATOARE
Line in the source file: S = A + B
(A, B, S - integer variables - tokens)
The code generator must explain
to the machine the templates finded
The scanner identifies tokens
“S” “=“
“A”
“+”
“B”
tokens “A”, “B”, “S” as variables
token “+” operator , token “=“ assign
11
COMPILATOARE
The parser verifies also the coherence of
variables, if are the same
(if all A, B, S integers – OK)
if one is different, the templates for “+” and
“=“ need conversion to coherent type
Ex: if S is real, A,B integer
“+” rule OK , result integer
“=“ (assignment rule) add
format conversion integer => real(float)12
COMPILERS
I-st parser operation - structures consistency
(conversion, if needed)
II-nd parser operation - A+B
(result in temporary memory)
III-rd parser operation - assigning result to S
(S=A+B)
Applicable BNF rules:
conversion, addition, assignment, in that order
13
COMPILERS
EXAMPLE II (bottom-up parsing)
S=A+B*C – D
scan the line, discover operations to be performed first
result become “nonterminal” symbol <N>
=>
The precedence of operators( + <. * ) | ( * .> -)
Assuming algebraic expression rules
Syntactic algebraic rule of multiplication
<product>::=<agent>*<agent>
Syntactic law of addition
<sum> ::=(<agent>+< agent >)|(< agent >-< agent >)
14
COMPILERS
EXEMPLE II (bottom-up parsing)
<N1>::=B*C
<N2>::=A+N1
<N3>::=N2-D
Syntactic tree of expression A+B*C-D
15
COMPILERS
EXEMPLE II (bottom-up parsing)
S=A+(B*C-D)
S=ATTRIB(N3)
N3=SUM(A,N2)
N2=SCAD(N1,D)
N1=PROD(B,C)
Syntactic tree of expression A+B*C-D
16
COMPILERS
STANDARD PROGRAM IN PASCAL SIMPLIFIED LANGUAGE
1
MEDIA ANALYSIS PROGRAM
2
VAR
3
NRCRT, I: INTEGER;
3
SARITM, SARMON, DIF: REAL
4
BEGIN
5
SARITM
:=0;
6
SARMON :=0;
7
FOR I
8
BEGIN
:=0
TO
100
DO
9
READ (NRCRT);
10
SARITM
:= SARITM + NRCRT;
11
SARMON
:= SARMON + 1 DIV NRCRT;
12
END;
13
DIF :=SARITM DIV 100 – 100 DIV SARMON;
14
WRITE (DIF);
15
END.
17
COMPILERS
GRAMMAR (BNF) PASCAL SIMPLIFIED LANGUAGE
1.
<prog>
::=
PROGRAM <prog-name> VAR <dec-list> BEGIN <stmt-list> END.
2.
<prog_name> ::=
id
3.
<dec_list>
::=
<dec> | <dec_list> ; <dec>
4.
<dec>
::=
<id_list> : <type>
5.
<type>
::=
INTEGER | REAL
6.
<id_list>
::=
id | <id_list> , id
7.
<stmt_list>
::=
<stmt> | <stmst_list> ; <stmt>
8.
<stmt>
::=
<assign> | <read> | <write> | <for>
9.
<assign>
::=
id := <exp>
10. <exp>
::=
<term> | <exp> + <term> | <exp> - <term>
11. <term>
::=
<factor> | <term> * <factor> | <term> DIV <factor>
12. <factor>
::=
id | int | (<exp>)
13. <read>
::=
READ(id_list)
14. <write>
::=
WRITE(id_list)
15. <for>
::=
FOR <index_exp> DO <body> ;
16. <index_exp> ::=
id:= <exp> TO <exp>
17. <body>
<stmt> | BEGIN <stmt_list> END
::=
18
COMPILERS
Token Name
PROGRAM
Cod
1
VAR
2
BEGIN
3
END.
4
END
5
INTEGER
6
REAL
7
READ
8
WRITE
9
FOR
10
TO
11
DO
12
;
13
:
14
,
15
:=
16
+
17
-
18
DIV
19
(
20
)
21
ID
INT
22
23
19
COMPILERS
Fisier elaborat de scaner
LINI
TOKEN
1
1
22
Specificity
^ STATUS
:
7
10
22
^I
16
23
< >1
11
23
< >100
12
20
COMPILERS
STANDARD
9. READ (NRCRT);
BNF:
13.
6.
<read>
::=READ(id_list)
<id_list> ::=id | <id_list>) ; id
21
COMPILERS
STANDARD
15. DIF
:=SARITM DIV 100 – 100 DIV SARMON;
BNF:
9. <assign>
::= id := <exp>
10. <exp>
::= <term> | <exp> - <term>
11. <term>
::= <factor> | <term> DIV <factor>
12. <factor>
::= id | int| (<exp>)
22
COMPILERS
23
PROG
RAM
.=.
VAR
INT
ID
)
(
DIV
-
+
:=
,
:
;
DO
TO
FOR
WRI
TE
REA
D
REA
L
INTE
GER
END
END
.
BEG
IN
VAR
PRO
GRA
M
COMPILERS
<.
.=.
BEGIN
<.
.=.
.=.
.>
.>
<.
<.
<.
<.
<.
<.
<.
<.
END.
END
INTEG
ER
REAL
.>
.>
.>
.>
.>
READ
.=.
WRITE
.=.
FOR
.=.
TO
<.
.>
DO
;
<.
<.
.>
.>
.>
.>
.>
.>
.>
.>
.>
.>
.>
.>
.>
:
<.
<.
<.
<.
<.
<.
<.
<.
<.
<.
,
.=.
:=
+
DIV
.>
.>
.=.
.>
<.
<.
<.
<.
.>
.>
.>
.>
.>
.>
.>
<.
<.
.>
.>
.>
.>
.>
.>
.>
<.
.>
.>
.>
.>
.>
.>
.>
<.
(
<.
)
ID
INT
<.
.>
.>
.>
.>
.>
.>
.>
.>
.>
.>
.>
.>
.>
.>
.>
.>
.>
.>
.=.
<.
<.
.>
<.
<.
<.
.>
<.
<.
.>
<.
.>
<.
<.
<.
<.
<.
.=.
<.
<.
.>
.>
.>
.>
.>
.>
.>
.>
.>
.>
.>
.>
24
COMPILERS
PROGRAM .=. VAR
BEGIN <. FOR
; .> END.
Vide pairs - grammatical errors
Precedence relations– only one
(consistency grammar)
25
COMPILERS
Generating semantic programs
DIF
:= SARITM DIV
100 – 100 DIV
SARMON
id1
:=
int -
id4
id1
:=
id1
:=
id2
DIV
exp1
-
int DIV
exp2
exp3
DIV
SARITM
#100
i1
DIV
#100
SARMON
i2
-
i1
i2
i3
:=
i4
,
DIF
26
COMPILERS
(1)
:=
#0
,
SARITM
{SARITM:=0}
(2)
:=
#0
,
SARMON
{SARMON:=0}
(3)
:=
#1
,
I
{FOR i=1 to 100}
(4)
JGT
I
#100
(15)
(5)
CALL
X READ
(6)
PARAM NRCRT
(7)
+
SARITM
NRCRT
i1
(8)
:=
i1
,
SARITM
(9)
DIV
#1
NRCRT
i2
(10) +
SARMON
i2
i3
(11) :=
i3
,
SARMON
(12) + I
#1
i4
(13) :=
i4
,
I
(15) DIV
SARITM
#100
i6
(16) DIV
#100
SARMON
i7
(17) - i6
i7
i8
(18) :=
i8
,
(19) CALL
X WRITE
{READ(NRCRT)}
{SARITM:=SARITM+NRCRT}
{SARMON:=SARMON+1 DIV NRCRT)
{sfîrşit FOR}
(14) J (4)
(20) PARAM DIF
{DIF :=SARITM DIV 100 - 100 DIV SARMON}
DIF
27
COMPILERS
1. L.L. Beck, „System Software: An introduction to systems
programming”, Addison Wesley. 3’rd edition, 1997.
2. A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman,
„Compilers: Principles, Techniques, and Tools”, 2'nd
Edition. Addison-Wesley, 2007
3. Wirth Niklaus ""Compiler Construction", AddisonWesley, 1996, 176 pages. Revised November 2005
4. Knuth, Donald E. "Backus Normal Form vs. Backus Naur
Form", Communications of the ACM 7 (12), 1964, p735–
736.
28