구문 분석

Download Report

Transcript 구문 분석


Compiler Model
Source
Program
Lexical
LexicalAnalyzer
Analyzer
t okens
Synt
Synt
axaxAnalyzer
Analyzer
AST
Back- End
Semant
Semant
icicAnalyzer
Analyzer
IntInt
ermediat
ermediat
ee
Code
CodeGenerat
Generat
oror
IL
Code
Code
OptOpt
imizer
imizer
IC
Front- End
Front-End
Back-End
Target Code Generat or
Object
Program
− language dependant part
− machine dependant part
Intermediate Language
[1/34]

IL의 필요성







Modular Construction
Automatic Construction
Easy Translation
Portability
Optimization
Bootstrapping
IL의 분류




Polish Notation
--- Postfix, IR
Three Address Code --- Quadruple, Triple, Indirect triple
Tree Structured Code --- PT, AST, TCOL
Abstract Machine Code --- P-code, EM-code, U-code, Byte-code
Intermediate Language
[2/34]

Two level Code Generation
Source


ILS-ILT
ILT
Target
Back-End
소스로부터 자동화에 의해 얻을 수 있는 형태
소스 언어에 의존적이며 high level이다.
ILT



ILS
ILS


Front-End
후단부의 자동화에 의해 목적기계로의 번역이 매우 쉬운 형태
목적기계에 의존적이며 low level이다.
ILS to ILT

ILS에서 ILT로의 번역이 주된 작업임.
Intermediate Language
[3/34]

Polish mathematician Lucasiewiez invented the parenthesis-free
notation.

Postfix(Suffix) Polish Notation

earliest IL
popular for interpreted language - SNOBOL, BASIC

general form :

e1 e2 ... ek OP (k ≥ 1)
where, OP : k_ary operator
ei
: any postfix expression (1 ≤ i ≤ k)
Intermediate Language
[4/34]

example :
if a then if c-d then a+c else a*c else a+b
〓〉a L1 BZ c d - L2 BZ a c + L3 BR
L2: a c * L3 BR L1: a b + L3:

note
1) high level: source to IL - fast & easy translation
IL to target - difficulty
2) easy evaluation - operand stack
3) optimization 부적당 - 다른 IL로의 translation 필요
4) parentheses free notation - arithmetic expression

interpretive language에 적합
Source
Translator
Postfix
Evaluator
Intermediate Language
Result
[5/34]


most popular IL, optimizing compiler
General form:
A := B op C
where,
A : result address
B, C : operand addresses
op : operator
(1) Quadruple - 4-tuple notation
<operator>,<operand1>,<operand2>,<result>
(2) Triple - 3-tuple notation
<operator>,<operand1>,<operand2>
(3) Indirect triple - execution order table & triples
Intermediate Language
[6/34]

example


a = b + c * d / e;
f = c * d;
quadruple
triple
(*, c, d, t1)
(/, t1, e, t2)
(+, b, t2, t3)
(=, t3, , a)
(*, c, d, t4)
(=, t4, f)
1. (*, c, d)
2. (/, (1), e)
3. (+, b, (2))
4. (=, a, (3))
5. (*, c, d)
6. (=, f, (5))
Indirect triple
operations
triples
1. (1)
2. (2)
3. (3)
4. (4)
5. (1)
6. (5)
Intermediate Language
(1) (*, c, d)
(2) (/, (1), e)
(3) (+, b, (2))
(4) (=, a, (3))
(5) (=, f, (1))
[7/34]

Note

Quadruple vs. Triple
quadruple optimization 용이
 triple
removal of temporary addresses
⇒ Indirect Triple


extensive code optimization 용이

IL rearrange 가능 (triple 제외)

easy translation - source to IL

difficult to generate good code

quadruple to two-address machine

triple to three-address machine
Intermediate Language
[8/34]

Abstract Syntax Tree

parse tree에서 redundant한 information 제거.



Leaf node
Internal node
[예제 9.8]
-- variable name, constant
-- operator
Text p.386
{
x = 0;
y = z + 2 * y;
while ((x<n) && (v[x] != z)) x = x+1;
return x;
}
Intermediate Language
[9/34]

Tree Structured Common Language(TCOL)



Variants of AST - containing the result of semantic analysis.
TCOL operator - type & context specific operator
Context
┌ value
--- rhs of assignment statement
├ location
--- lhs of assignment statement
├ boolean
--- conditional control statement
└ statement
--- statement
ex)
. : operand - location
result - value
while : operand - boolean, statement
result - statement
Intermediate Language
[10/34]
Example) int a; float b;
...
b = a + 1;
AST:
assign
b
TCOL:
assign
b
add
float
addi
a
1
.

Representation ---- graph orientation
internal notation ----- efficient
1
a
external notation ---- debug, interface
linear graph notation
Intermediate Language
[11/34]

Pascal P Compiler --- portable compiler producing P_CODE
for an abstract machine(P_Machine).
 P_Machine ----- hypothetical stack machine designed for
Pascal language.
(1) Instruction --- closely related to the PASCAL language.
(2) Registers
PC --- program counter
NP --- new pointer
SP --- stack pointer
MP --- mark pointer
(3) Memory
CODE --- instruction part
STORE --- data part(constant area, stack, heap)
Intermediate Language
[12/34]
CODE
PC
STORE
MP current activation record
stack
stack
SP
NP
heap
heap
constant area
Intermediate Language
[13/34]

Ucode
the intermediate form used by the Stanford Portable Pascal compiler.
stack-based and is defined in terms of a hypothetical stack machine.
Ucode Interpreter : Appendix B.

Addressing
stack addressing ===> a tuple : (B, O)
B : the block number containing the address
O : the offset in words from the beginning of the block,
offsets start at 1.
label
to label any Ucode instruction with a label field.
All targets of jumps and procedures must be labeled.
All labels must be unique for the entire program.
Intermediate Language
[14/34]

Example :

Consider the following skeleton :
int x;
void main()
{
int i;
int j;
// .....
}

block number
- 전역변수 : 1
- 함수 내 지역변수 : 2

variable addressing
- x : (1,1)
- i : (2,1)
- j : (2,2)
Intermediate Language
[15/34]

Ucode Operations(39개)

Unary
--- notop, neg, inc, dec, dup

Binary
--- add, sub, mult, div, mod, swp
and, or, gt, lt, ge, le, eq, ne

Stack Operations
--- lod, str, ldc, lda

Control Flow
--- ujp, tjp, fjp

Range Checking
--- chkh, chkl

Indirect Addressing --- ldi (load indirect), sti (store indirect)

Procedure
--- cal, ret, retv, ldp, proc, end

Etc.
--- nop, bgn, sym
Intermediate Language
[16/34]

Example :

x = a + b * c;
lod
11
/* a */
lod
12
/* b */
lod
13
/* c */
mult
add
str 1 4 /* x */

if (a>b) a = a + b;
lod
11
lod
12
gt
fjp next
lod
11
lod
12
add
str 1 1 /* a */
next
...
/* a */
/* b */
/* a */
/* b */
Intermediate Language
[17/34]

Indirect Addressing


is used to access the array elements.
ldi --- indirect load


replace stacktop by the value of the item at location stacktop.
to retrieve A[i] :
lod i
lda A
add
ldi
// actually (Bi, Oi))
// also (block number, offset)
// effective address
// indirect load gets contents of A[i]
Intermediate Language
[18/34]

sti


---
indirect store
sti stores stacktop into the address at stack[stacktop-1],
both items are popped.
A[i] = j;
lod i
lda A
add
lod j
Sti
Intermediate Language
[19/34]

Procedure Calling Sequence

function definition :


function call :


void func(int x, int array[]) { }
func(a, list);
calling sequence :
ldp
// load parameter
lod
a // load the value of actual parameter
lda
list
// load the address of actual parameter
call
func
// call func
Intermediate Language
[20/34]