Transcript Document

Chapter 8
Intermediate Code
Basic Code Generation Techniques
Gang S. Liu
College of Computer Science & Technology
Harbin Engineering University
Introduction
• Final task of the compiler is to generate executable
code for a target machine that is a representation of
a semantics of the source code.
• This is the most complex phase of a compiler.
– It depends on detailed information about
• the target architecture,
• the structure of the runtime
• OS
• There is an attempt to optimize the speed and the
size of the target code to take advantages of special
features of the target machine (registers, addressing
modes, pipelining, and cache memory)
Compiler Construcion
[email protected]
2
Introduction (cont)
•
•
The code generation is typically broken into
several steps, often including an abstract code
called intermediate code.
Two popular forms are
1. Three address code
2. P-code
Compiler Construcion
[email protected]
3
Intermediate Code
• A data structure that represents the source program
during translation is called an intermediate
representation (IR).
• An abstract syntax tree was used as the principal IR.
• An abstract syntax tree does not resemble target
code.
– Example: control flow constructs.
• A new form of IR is necessary.
• Such intermediate representation that closely
resembles target code is called intermediate code.
Compiler Construcion
[email protected]
4
Form of Intermediate Code
• Intermediate code is a linearization of the syntax
tree.
• Intermediate code
– Can be very high level, representing operations almost
as abstractly as the syntax tree or can closely resemble
target code.
– May use or not used detailed information about the
target machine and runtime environment.
Compiler Construcion
[email protected]
5
Use of Intermediate Code
• Intermediate code is useful
– For producing extremely efficient code
– In making a compiler more easily retargetable (if
intermediate code is relatively target independent).
Source Language 1
Target Language 1
Intermediate Code
Source Language 2
Compiler Construcion
Target Language 2
[email protected]
6
Three-Address Code
• The most basic instruction of three address code
x = y op z
• The use of the address x differs from the addresses
of y and z.
• y and z can represent constants and literal values.
Compiler Construcion
[email protected]
7
Example
+
2*a+(b-3)
*
-
2
a
b
t1=2*a
t1=b-3
t2=b-3
t2=2*a
t3=t1+t2
t3=t2+t1
Left-to-right linearization
Compiler Construcion
3
Right-to-left linearization
[email protected]
8
Three-Address Code (cont)
• It is necessary to vary form of the three-address
code to express all constructs (e.g. t2=-t1)
• No standard form exists.
Compiler Construcion
[email protected]
9
Implementation of Three-Address Code
•
•
•
Each three-address instruction is implemented as
a record structure containing several fields.
The entire sequence is an array or a linked list .
The most common implementation requires four
fields – quadruple
– One for operation and three for addresses.
•
For instructions that need fewer number of
addresses, one or more addresses fields is given
null or “empty” values.
Compiler Construcion
[email protected]
10
Factorial Program
{ Sample program
in TINY language computes factorial }
read x; { input an integer }
if 0 < x then { don't compute if x <= 0 }
fact := 1;
repeat
fact := fact * x;
x := x - 1
until x = 0;
write fact { output factorial of x }
end
Compiler Construcion
[email protected]
11
Syntax Tree for Factorial Program
Compiler Construcion
[email protected]
12
Example
(rd, x, _, _)
{ Sample program
(gt, x, 0, t1)
in TINY language (if_f, t1, L1, _)
computes factorial }
(asn, 1, fact, _)
read x; { input an integer }
if 0 < x then { don't compute if x
<= 0 }L2, _, _)
(lab,
fact := 1;
(mul, fact, x, t2)
repeat
(asn, t2, fact, _)
fact := fact * x;
(sub, x, 1, t3)
x := x - 1
(asn, t3, x, _)
until x = 0;
write fact { output factorial of
x }x, 0, t4)
(eq,
end
(if_f, t4, L2, _)
(wri, fact, _, _)
(lab, L1, _, _)
(halt, _, _, _) 13
Compiler Construcion
[email protected]
Different Representation
• Instructions themselves represent temporaries.
• This reduces the number of address fields from
three to two.
• Such representation is called a triple.
• Amount of space is reduced.
• Major drawback: any movement becomes difficult
for array representation.
Compiler Construcion
[email protected]
14
Example
(rd, x, _, _)
(0) (rd, x, _)
(gt, x, 0, t1)
(1) (gt, x, 0)
(if_f, t1, L1, _)
(2) (if_f, (1), (11))
(asn, 1, fact, _)
(3) (asn, 1, fact)
(lab, L2, _, _)
(4) (mul, fact, x)
(mul, fact, x, t2)
(5) (asn, (4), fact)
(ans, t2, fact, _)
(6) (sub, x, 1)
(sub, x, 1, t3)
(asn, t3, x, _)
(7) (asn, (6), x)
(eq, x, 0, t4)
(8) (eq, x, 0)
(if_f, t4, L2, _)
(9) (if_f, (8), (4))
(wri, fact, _, _)
(10) (wri, fact, _)
(lab, L1, _, _)
(11) (halt, _, _)
(halt,
_, _, _) [email protected]
Compiler Construcion
15
P-Code
• Standard assembly language code produced by
Pascal compilers in 1970/80.
• Designed for hypothetical stack machine, called Pmachine.
• Interpreters were written for actual machines.
• This made Pascal compilers easy portable.
– Only interpreter must be rewritten for a new platform.
• Modifications of P-code are used in a number of
compilers, mostly for Pascal-like languages.
Compiler Construcion
[email protected]
16
P-Machine
• Consists of
–
–
–
–
A code memory
An unspecified data memory for named variables
A stack for temporary data
Registers needed to maintain the stack and support
execution.
Compiler Construcion
[email protected]
17
Example 1
2*a+(b-3)
ldc 2
; load constant 2
lod a
; load value of variable a
mpi
; integer multiplication
lod b
; load value of variable b
ldc 3
; load constant 3
sbi
; integer subtraction
adi
; integer addition
Compiler Construcion
[email protected]
18
Example 2
x:=y+1
lda x
; load address of x
lod y
; load value of y
ldc 1
; load constant 1
adi
; add
sto
; store top to address
; bellow top & pop both
Compiler Construcion
[email protected]
19
Factorial Program
lda x
rdi
; load address of x
; read an integer, store to
; address on top of the stack
; (& pop it)
lod x
; load the value of x
ldc 0
; load constant 0
grt
;pop an compare top two
;values push the Boolean
;result
fjp L1
;pop Boolean value,
;jump to L1 if false
lda fact ;load address of fact
ldc 1
;load constant 1
sto
;pop two values, storing
;the first to address
;represented by second
lab L2
; definition of label 2
lda fact ; load address of fact
Compiler Construcion
lod fact ;load value of fact
lod x
;load value of x
mpi
;multiply
sto
;store top to
;address of second &
;pop
lda x
;load address of x
lod x
;load a value of x
ldc 1
;load constant 1
Sbi
;subtract
sto;
lod x
ldc 0
equ
;test for equality
fjp L2
;jump to L2 of false
lod fact;
wri
lab L1
stp
[email protected]
20
P-Code and Three-Address Code
• P-code
– is closer to actual machine.
– Instructions require fewer addresses.
• “One-address” or “zero-address”
– Less compact in terms of instructions.
– Not “self-contained”
• Instructions operate implicitly on a stack
• All temporary values are on stack, no need for temporary
names.
Compiler Construcion
[email protected]
21
Generation of Target Code
•
Involves two standard techniques
1. Macro expansion
– Replaces each intermediate code instruction with an
equivalent sequence of target code instructions.
2. Static simulation
– Straight-line simulation of the effects of the intermediate
code and generating target code to match these effects.
Compiler Construcion
[email protected]
22
Example
exp → id = exp | aexp
aexp → aexp + factor | factor
factor → (exp) | num | id
lda
lod
ldc
adi
stn
ldc
adi
x
x
3
(x=x+3)+4
t1 = x+3
x = t1
t2 = t1+4
4
Compiler Construcion
[email protected]
23
Static Simulation
lda
lod
ldc
adi
stn
ldc
adi
x
x
3
top of stack
3
4
x
address of x
t1=x+3
top of stack
t1
address of x
Compiler Construcion
[email protected]
24
Static Simulation
lda
lod
ldc
adi
stn
ldc
adi
x
x
3
top of stack
4
t1
x=t1
address of x
top of stack
t1
Compiler Construcion
[email protected]
25
Static Simulation
lda
lod
ldc
adi
stn
ldc
adi
x
x
3
t1 = x+3
x = t1
t2 = t1+4
top of stack
4
4
t2=t1+4
t1
top of stack
t2
Compiler Construcion
[email protected]
26
Example
exp → id = exp | aexp
aexp → aexp + factor | factor
factor → (exp) | num | id
lda
lod
ldc
adi
stn
ldc
adi
x
x
3
(x=x+3)+4
t1 = x+3
x = t1
t2 = t1+4
4
Compiler Construcion
[email protected]
27
Macro Expansion
t1 = x+3
x = t1
t2 = t1+4
t1 = x+3
x = t1
lda t1
lda t1
lod x
lod x
ldc 3
ldc 3
adi
adi
sto
sto
lda x
lda x
lod t1
lod t1
sto
sto
lda t2
t2 = t1+4
lod t1
ldc 4
adi
Compiler Construcion
[email protected]
sto
lda t2
lod t1
ldc 4
adi
sto
28