Intermediate Code Generation

Download Report

Transcript Intermediate Code Generation

Intermediate Code Generation

Professor Yihjia Tsai Tamkang University

Introduction

• Intermediate representation (IR) – Generally a program for an abstract machine (can be assembly language or slightly above) – Easy to produce and translate into target code • Why?

– When a re-targetable compiler is needed • i.e., if we are planning a portable compiler, with different back ends – Better/easier for some optimizations Sanath Jayasena/Apr 2006 • Machine code can be more complex 7-2

Java ML Pascal C Java ML Pascal C Sanath Jayasena/Apr 2006 Intermediate Representation Sparc MIPS Pentium Alpha Sparc MIPS Pentium Alpha 7-3

Introduction

… contd

Front end semantic analysis and translation to IR can do scanning, parsing, • Back end will then optimize and generate target code • IR can modularize the task – Front end not bothered about machine details – Back end not bothered about source language Sanath Jayasena/Apr 2006 7-4

Introduction

… contd

• Qualities of a good IR – Convenient for semantic analysis phase to produce – Convenient to translate into machine language of all desired target hardware – Each construct has a clear and simple meaning • Easy for optimizing transformations Sanath Jayasena/Apr 2006 7-5

Intermediate Representations

• Abstract syntax trees • Postfix notation • Directed acyclic graphs (DAGs) • Three-address code (3AC) Sanath Jayasena/Apr 2006 7-6

Abstract Syntax Trees

• Also called

Intermediate Rep.

trees

( IR ) – Has individual components that describe only very simple things – E.g., load, store, add, move, jump – E.g., pp. 136-139, Tiger book (see handout) Sanath Jayasena/Apr 2006 7-7

Postfix Notation

• For an expression E, inductively: 1. If E is a var or const, the postfix notation is E 2. If E is of the form E1 E2 , the postfix notation is E1 ’ E1 ’ , E2 ’ E2 ’ are postfix notations for where E1 , E2 3. If E is of the form (E1) then the postfix notation for E1 is also that for E – Parenthesis unnecessary Sanath Jayasena/Apr 2006 7-8

Example

• What are the postfix notations for (9 5)+2 and 9-(5+2) • (9-5)+2 in postfix notation is 95-2+ • 9-(5+2) in postfix notation is 952+ Sanath Jayasena/Apr 2006 7-9

Syntax-Directed Translation

• Translation guided by CFG ’ s – Based on “

attributes

” of language constructs • E.g., type, string, number, memory location – Attach attributes to grammar symbols – Values for attributes computed by

semantic rules

associated with productions • Translation of a language construct in terms of attributes associated with its syntactic components 7-10 Sanath Jayasena/Apr 2006

Syntax-Directed Translation

… contd

• Two notations for associating semantic rules with productions in a CFG

1. Syntax-directed definitions

• High-level specs, details hidden, order of translation unspecified

2. Translation schemes

• Order of translations specified, more details shown • [Dragon book: Section 2.3 and Chapter 5] Sanath Jayasena/Apr 2006 7-11

Syntax-Directed Definitions

• For each grammar symbol: associate a set of attributes (

synthesized

and

inherited

) • For each production: a semantic rule defines the values of attribute at the parse-tree node used at that node • Grammar + set of semantic rules Sanath Jayasena/Apr 2006 7-12

Annotated Parse Tree

• A parse tree showing attribute value at each node • Used for translation (which is an input  output mapping) – For input x, construct parse tree for x – If a node n in tree is labeled by symbol Y • Value of attribute p of Y at node n denoted as Y.p • Value of Y.p computed using semantic rule for attribute p associated with the Y Sanath Jayasena/Apr 2006 production at n 7-13

Synthesized Attributes

• An attribute is nodes

synthesized

if its value at a parse tree node is determined from those at the child • Can be evaluated with a single bottom-up tree traversal (e.g., depth-first traversal) • A syntax-directed definition that uses these exclusively is said to be an

s-attributed definition

Sanath Jayasena/Apr 2006 7-14

Example 1

Translating expressions into postfix Production Semantic Rule expr → term expr → term expr → expr 1 expr 1 term + term → 0 expr.t := expr 1 .t || term.t || ‘ + ’ expr.t := expr 1 .t || term.t || ‘ ’ expr.t := term.t

term.t := ‘ 0 ’ … … “ .t

” → concatenation term.t := ‘ 9 ’ 7-15

Example 1

… contd

expr.t = 95-2+ expr.t = 95 expr.t = 9 term.t = 5 term.t = 2 term.t = 9 9 5 + 2 Annotated parse tree corresponding to “ 9-5+2 ” Sanath Jayasena/Apr 2006 7-16

Example 2

Syntax-directed definition for desk calculator program Production Semantic Rule L → E $ E → E → E 1 T + T T → T → T 1 F * F F →

digit

print(E.val) E.val := E 1 .val + T.val

E.val := T.val

T.val := T 1 .val × F.val

T.val := F.val

F.val := digit.lexval

Draw the annotated parse tree for “ 3*5+4 $ ” 7-17

Example 2

… contd

E.Val = 19 E.val = 15 + T.val = 15 L T.val=4 $ F.val=4 T.val=3 T.val=5 * F.val=3 F.val=5

digit

.lexval=4

digit

.lexval=3

digit

.lexval=5 Annotated parse tree corresponding to “ 3*5+4 $ ” Sanath Jayasena/Apr 2006 7-18

Inherited Attributes

• Value at a node is defined using attributes at siblings and/or parent of the node • Useful for tracking the context of a construct – E.g., decide whether address or value of a var is needed by keeping track of whether it appears on RHS or LHS of an assignment Sanath Jayasena/Apr 2006 7-19

Example

Syntax-directed definition with inherited attribute

real

Production Semantic Rule D → T L L.in := T.type

T →

int

T.type := integer T →

real

L → L 1 , id L →

id

T.type := real L 1 .in := L.in

addtype(id.entry, L.in) addtype(id.entry, L.in) Sanath Jayasena/Apr 2006 Draw the annotated parse tree for “ real id , 7-20

Example

… contd

D T.type = real L.in = real

real

L.in = real , L.in = real ,

id 2 id 1

Annotated parse tree for “ real id1, id2, id3 ” with inherited attribute

in

at each node L Sanath Jayasena/Apr 2006

id 3

7-21

Translation Schemes

Semantic actions embedded within RHS of productions – Unlike syntax-directed definitions,

shown order of evaluation of semantic rules explicitly

– Action to be taken shown by enclosing in { } • E.g., rterm  term { print ( ‘ + ’ ) } rterm1 – In a parse tree in this context, an action is shown by an extra child node & dashed edge Sanath Jayasena/Apr 2006 7-22

Depth-First Order

L-attributed definitions – Attributes can be always evaluated in

depth-first order

(left-to-right) • Translation schemes with restrictions motivated by L-attributed definitions ensure that an attribute value is available when an action refers to it – E.g., when only synthesized attributes exist Sanath Jayasena/Apr 2006 7-23

Example

• Translation scheme that maps infix expressions with addition/subtraction into corresponding postfix expressions E → R → | Λ R → | Λ T → T R addop T subop T

num

{ print(addop.lexeme) } { print(subop.lexeme) } { print(num.val) } R R 2 1 7-24 Sanath Jayasena/Apr 2006 • Show the parse tree for “ 9-5+2 ”

Example

… contd

E T

9

{ print (‘9’) } T R { print (‘-’) }

5

{ print (‘5’) } + R T { print (‘+’) } R Λ

2

{ print (‘2’) } Parse tree for “ 9-5+2 ” showing actions; when performed in depth first order, prints “ 95-2+ ” Sanath Jayasena/Apr 2006 7-25

Emitting a Translation

• For

simple

syntax-directed definitions, implementation possible with translation schemes where actions print additional strings in the order of appearance – [Simple: string representing the translation of the non-terminal on LHS of each production is the concatenation of translations of non-terminals on the RHS, in the same order as in the production] Sanath Jayasena/Apr 2006 7-26

Example

• A translation scheme derived from Example in slide 7-15 expr → expr → expr + term { print ( ‘ + ’ ) } expr – term { print ( ‘ ’ ) } expr → term → term 0 term → 1 { print ( { print ( ‘ ‘ 0 1 ’ ’ ) } ) } … term → 9 { print ( ‘ 9 ’ ) } Sanath Jayasena/Apr 2006 7-27

Example

expr term

9

{ print (‘9’) }

… contd

expr + expr term term { print (‘-’) }

2

{ print (‘2’) }

5

{ print (‘5’) } { print (‘+’) } Sanath Jayasena/Apr 2006 Actions translating “ 9-5+2 ” into “ 95-2+ ” 7-28

Constructing Syntax Trees

• Syntax-directed definitions can be used • Recall: syntax tree is a condensed form of parse tree – Operators, keywords appear as interior nodes • Construction: similar to postfix notation – For a subexpression, create a node for each operator and operand – Children of operator node represent operands (as subexpressions) of that 7-29

Nodes in a Syntax Tree

• A node is like a record with many fields: – label, pointers to operand nodes, value etc., • 3 basic functions to create nodes – mknode(op, left, right) : operator node with label op , two pointer fields left and right – mkleaf(id, entry) : ID node with label id and field entry pointing to symbol-table 7-30 – mkleaf(num, val) : a NUM node with

Example

• From Example 5.7, p. 288 – What is the sequence of calls to create the syntax tree for the expression “ a – 4 + c ” ?

p1 = mkleaf(id, entry_a); p2 = mkleaf(num, 4); p3 = mknode( ‘ ’ , p1, p2); p4 = mkleaf(id, entry_c); p5 = mknode( ‘ + ’ , p3, p4); What is the syntax tree?

Sanath Jayasena/Apr 2006 7-31

Constructing Syntax Trees

… contd

• A syntax-directed definition may be used for constructing a syntax tree – Semantic rules: calls to functions mknode( ) and mkleaf( ) – E.g., for the production, E  may have the semantic rule E1 + T , we E.nptr = mknode( ‘ + ’ , E1.nptr, T.nptr) – Example 5.8, p. 289 Sanath Jayasena/Apr 2006 7-32

DAGs for Expressions

• A dag for an expression identifies common subexpressions – Unlike a syntax tree, a node for a common subexpression may have > 1 parent node – E.g., “ a + a * (b-c) + (b-c) * d ” • Fig. 5.11, p.291

• How to create a dag, given an expression?

– Check if an identical node already exists 7-33

Review

• Example: for the assignment statement, a = b * -c + b * -c, give a syntax tree, dag and postfix notation • Fig. 8.2, p. 464 Sanath Jayasena/Apr 2006 7-34

Three-Address Code (3AC)

• 3AC is a sequence of statements of the general form x := y z – x, y, z are names, const ’ s, generated temp ’ s – is any operator (arithmetic, logical) • 3AC means each statement usually has 3 addresses (2 for operands, 1 Sanath Jayasena/Apr 2006 for the result) 7-35

Examples

• Given the expression, x+y*z the 3AC t1 := y * z t2 := x + t1 • Show 3AC for (a) syntax tree, (b) dag discussed earlier in slide 7-34 (Fig. 8.2) – Fig. 8.5, p. 466 Sanath Jayasena/Apr 2006 7-36

3AC

… contd

• A name in a program replaced by a pointer to a symbol table entry for that name • 3AC statements are like assembly code – There are flow-control statements – They can have symbolic labels – A label represents the index of a 3AC statement in an array containing the intermediate code Sanath Jayasena/Apr 2006 7-37

Types of 3AC Statements

1. Assignment statements with binary operators (arithmetic or logical) – Of the form x:= y z 2. Assignment statements with unary operators (minus, logical not, shift etc.,) – Of the form x:= y 3. Copy statements – Of the form x := y Sanath Jayasena/Apr 2006 7-38

Types of 3AC Statements

… contd

4. Unconditional jump: goto L – Statement with label L to be executed next 5. Conditional jump: if x y goto L – A relational operator (<, =, >= … ) is applied to x and y – If the relation holds, statement with label L executed next 7-39

Types of 3AC Statements

… contd

6. Function calls: param x , and return y call p, n –

return y ” is optional – E.g., for call p(x1, x2, … , xn) will be param x1 param x2 the 3AC … param xn call p, n Sanath Jayasena/Apr 2006 7-40

Types of 3AC Statements

… contd

7. Indexed assignments: x := y[i] , x[i] := y – In x:=y[i] location i units beyond memory location y : x is set to the value in – In x[i]:=y beyond memory location x is set to the value of y : value in location i units – x, y and i are data objects 7-41 Sanath Jayasena/Apr 2006

Types of 3AC Statements

… contd

8. Address & pointer assignments: x := &y , x := *y , *x := y – In x:= &y of y : x is set to be the location • y denotes an

l

-value, x is a pointer name – In x:= *y : (

r

-value of) x is set to the value in location pointed by y • y is a pointer;

r

-value of y is a location – In *x:= y : (

r

-value of) object pointed by x is set to (the

r

-value of) y Sanath Jayasena/Apr 2006 7-42

Syntax-Dir. Translation into 3AC

• When 3AC code is generated, temp names are made up for interior nodes in syntax tree – E.g., for E  E1 + E2, value of E on LHS will be computed to a new temp t • Example – Fig. 8.6, Fig 8.7 on p. 469 Sanath Jayasena/Apr 2006 7-43

Implementation of 3AC

• 3AC is an abstract form – Can be implemented in a compiler as records – (with fields for operator and operands) • Three representations – Quadruples – Triples – Indirect triples Sanath Jayasena/Apr 2006 7-44

(a) Quadruples

• A record structure with 4 fields – op , arg1 , arg2 and

result

• Examples – For x := y op z • y in arg1, z we have: in arg2 and x in result – For unary operators, arg2 not used – For param operator, arg2 and result unused – Fig. 8.8(a), p. 471 for a:= b* -c + b* -c • Content of fields are pointers to ST Sanath Jayasena/Apr 2006 entries 7-45

(b) Triples

• Temps generated in quadruples must be entered in symbol table • To avoid this, we can refer to a temp value by the location of the relevant statement – We can have records with only 3 fields • op , arg1 and arg2 – Fields arg1 and arg2 can be pointers to ST entries or to triple structure for temp values 7-46

(c) Indirect Triples

• Listing of pointers to triples, rather than triples themselves • Example – We can use an array to list pointers to triples in the desired order – Example: Fig 8.10 on p. 472 Sanath Jayasena/Apr 2006 7-47

Translating Language Constructs

• Balance of Chapter 8 in Dragon book covers details on implementing: – Declarations, scope – Assignments, array elements, fields in records – Boolean expressions – Case statements – Label renaming (called backpatching) – Function calls Sanath Jayasena/Apr 2006 7-48