Transcript Intermediate Code Generation
Intermediate Code Generation
Professor Yihjia Tsai Tamkang University
Introduction
• Intermediate representation (IR) – Generally a program for an abstract machine (can be assembly language or slightly above) – Easy to produce and translate into target code • Why?
– When a re-targetable compiler is needed • i.e., if we are planning a portable compiler, with different back ends – Better/easier for some optimizations Sanath Jayasena/Apr 2006 • Machine code can be more complex 7-2
Java ML Pascal C Java ML Pascal C Sanath Jayasena/Apr 2006 Intermediate Representation Sparc MIPS Pentium Alpha Sparc MIPS Pentium Alpha 7-3
Introduction
… contd
• Front end semantic analysis and translation to IR can do scanning, parsing, • Back end will then optimize and generate target code • IR can modularize the task – Front end not bothered about machine details – Back end not bothered about source language Sanath Jayasena/Apr 2006 7-4
Introduction
… contd
• Qualities of a good IR – Convenient for semantic analysis phase to produce – Convenient to translate into machine language of all desired target hardware – Each construct has a clear and simple meaning • Easy for optimizing transformations Sanath Jayasena/Apr 2006 7-5
Intermediate Representations
• Abstract syntax trees • Postfix notation • Directed acyclic graphs (DAGs) • Three-address code (3AC) Sanath Jayasena/Apr 2006 7-6
Abstract Syntax Trees
• Also called
Intermediate Rep.
trees
( IR ) – Has individual components that describe only very simple things – E.g., load, store, add, move, jump – E.g., pp. 136-139, Tiger book (see handout) Sanath Jayasena/Apr 2006 7-7
Postfix Notation
• For an expression E, inductively: 1. If E is a var or const, the postfix notation is E 2. If E is of the form E1
Example
• What are the postfix notations for (9 5)+2 and 9-(5+2) • (9-5)+2 in postfix notation is 95-2+ • 9-(5+2) in postfix notation is 952+ Sanath Jayasena/Apr 2006 7-9
Syntax-Directed Translation
• Translation guided by CFG ’ s – Based on “
attributes
” of language constructs • E.g., type, string, number, memory location – Attach attributes to grammar symbols – Values for attributes computed by
semantic rules
associated with productions • Translation of a language construct in terms of attributes associated with its syntactic components 7-10 Sanath Jayasena/Apr 2006
Syntax-Directed Translation
… contd
• Two notations for associating semantic rules with productions in a CFG
1. Syntax-directed definitions
• High-level specs, details hidden, order of translation unspecified
2. Translation schemes
• Order of translations specified, more details shown • [Dragon book: Section 2.3 and Chapter 5] Sanath Jayasena/Apr 2006 7-11
Syntax-Directed Definitions
• For each grammar symbol: associate a set of attributes (
synthesized
and
inherited
) • For each production: a semantic rule defines the values of attribute at the parse-tree node used at that node • Grammar + set of semantic rules Sanath Jayasena/Apr 2006 7-12
Annotated Parse Tree
• A parse tree showing attribute value at each node • Used for translation (which is an input output mapping) – For input x, construct parse tree for x – If a node n in tree is labeled by symbol Y • Value of attribute p of Y at node n denoted as Y.p • Value of Y.p computed using semantic rule for attribute p associated with the Y Sanath Jayasena/Apr 2006 production at n 7-13
Synthesized Attributes
• An attribute is nodes
synthesized
if its value at a parse tree node is determined from those at the child • Can be evaluated with a single bottom-up tree traversal (e.g., depth-first traversal) • A syntax-directed definition that uses these exclusively is said to be an
s-attributed definition
Sanath Jayasena/Apr 2006 7-14
Example 1
Translating expressions into postfix Production Semantic Rule expr → term expr → term expr → expr 1 expr 1 term + term → 0 expr.t := expr 1 .t || term.t || ‘ + ’ expr.t := expr 1 .t || term.t || ‘ ’ expr.t := term.t
term.t := ‘ 0 ’ … … “ .t
” → concatenation term.t := ‘ 9 ’ 7-15
Example 1
… contd
expr.t = 95-2+ expr.t = 95 expr.t = 9 term.t = 5 term.t = 2 term.t = 9 9 5 + 2 Annotated parse tree corresponding to “ 9-5+2 ” Sanath Jayasena/Apr 2006 7-16
Example 2
Syntax-directed definition for desk calculator program Production Semantic Rule L → E $ E → E → E 1 T + T T → T → T 1 F * F F →
digit
print(E.val) E.val := E 1 .val + T.val
E.val := T.val
T.val := T 1 .val × F.val
T.val := F.val
F.val := digit.lexval
Draw the annotated parse tree for “ 3*5+4 $ ” 7-17
Example 2
… contd
E.Val = 19 E.val = 15 + T.val = 15 L T.val=4 $ F.val=4 T.val=3 T.val=5 * F.val=3 F.val=5
digit
.lexval=4
digit
.lexval=3
digit
.lexval=5 Annotated parse tree corresponding to “ 3*5+4 $ ” Sanath Jayasena/Apr 2006 7-18
Inherited Attributes
• Value at a node is defined using attributes at siblings and/or parent of the node • Useful for tracking the context of a construct – E.g., decide whether address or value of a var is needed by keeping track of whether it appears on RHS or LHS of an assignment Sanath Jayasena/Apr 2006 7-19
Example
Syntax-directed definition with inherited attribute
real
Production Semantic Rule D → T L L.in := T.type
T →
int
T.type := integer T →
real
L → L 1 , id L →
id
T.type := real L 1 .in := L.in
addtype(id.entry, L.in) addtype(id.entry, L.in) Sanath Jayasena/Apr 2006 Draw the annotated parse tree for “ real id , 7-20
Example
… contd
D T.type = real L.in = real
real
L.in = real , L.in = real ,
id 2 id 1
Annotated parse tree for “ real id1, id2, id3 ” with inherited attribute
in
at each node L Sanath Jayasena/Apr 2006
id 3
7-21
Translation Schemes
• Semantic actions embedded within RHS of productions – Unlike syntax-directed definitions,
shown order of evaluation of semantic rules explicitly
– Action to be taken shown by enclosing in { } • E.g., rterm term { print ( ‘ + ’ ) } rterm1 – In a parse tree in this context, an action is shown by an extra child node & dashed edge Sanath Jayasena/Apr 2006 7-22
Depth-First Order
• L-attributed definitions – Attributes can be always evaluated in
depth-first order
(left-to-right) • Translation schemes with restrictions motivated by L-attributed definitions ensure that an attribute value is available when an action refers to it – E.g., when only synthesized attributes exist Sanath Jayasena/Apr 2006 7-23
Example
• Translation scheme that maps infix expressions with addition/subtraction into corresponding postfix expressions E → R → | Λ R → | Λ T → T R addop T subop T
num
{ print(addop.lexeme) } { print(subop.lexeme) } { print(num.val) } R R 2 1 7-24 Sanath Jayasena/Apr 2006 • Show the parse tree for “ 9-5+2 ”
Example
… contd
E T
9
{ print (‘9’) } T R { print (‘-’) }
5
{ print (‘5’) } + R T { print (‘+’) } R Λ
2
{ print (‘2’) } Parse tree for “ 9-5+2 ” showing actions; when performed in depth first order, prints “ 95-2+ ” Sanath Jayasena/Apr 2006 7-25
Emitting a Translation
• For
simple
syntax-directed definitions, implementation possible with translation schemes where actions print additional strings in the order of appearance – [Simple: string representing the translation of the non-terminal on LHS of each production is the concatenation of translations of non-terminals on the RHS, in the same order as in the production] Sanath Jayasena/Apr 2006 7-26
Example
• A translation scheme derived from Example in slide 7-15 expr → expr → expr + term { print ( ‘ + ’ ) } expr – term { print ( ‘ ’ ) } expr → term → term 0 term → 1 { print ( { print ( ‘ ‘ 0 1 ’ ’ ) } ) } … term → 9 { print ( ‘ 9 ’ ) } Sanath Jayasena/Apr 2006 7-27
Example
expr term
9
{ print (‘9’) }
… contd
expr + expr term term { print (‘-’) }
2
{ print (‘2’) }
5
{ print (‘5’) } { print (‘+’) } Sanath Jayasena/Apr 2006 Actions translating “ 9-5+2 ” into “ 95-2+ ” 7-28
Constructing Syntax Trees
• Syntax-directed definitions can be used • Recall: syntax tree is a condensed form of parse tree – Operators, keywords appear as interior nodes • Construction: similar to postfix notation – For a subexpression, create a node for each operator and operand – Children of operator node represent operands (as subexpressions) of that 7-29
Nodes in a Syntax Tree
• A node is like a record with many fields: – label, pointers to operand nodes, value etc., • 3 basic functions to create nodes – mknode(op, left, right) : operator node with label op , two pointer fields left and right – mkleaf(id, entry) : ID node with label id and field entry pointing to symbol-table 7-30 – mkleaf(num, val) : a NUM node with
Example
• From Example 5.7, p. 288 – What is the sequence of calls to create the syntax tree for the expression “ a – 4 + c ” ?
p1 = mkleaf(id, entry_a); p2 = mkleaf(num, 4); p3 = mknode( ‘ ’ , p1, p2); p4 = mkleaf(id, entry_c); p5 = mknode( ‘ + ’ , p3, p4); What is the syntax tree?
Sanath Jayasena/Apr 2006 7-31
Constructing Syntax Trees
… contd
• A syntax-directed definition may be used for constructing a syntax tree – Semantic rules: calls to functions mknode( ) and mkleaf( ) – E.g., for the production, E may have the semantic rule E1 + T , we E.nptr = mknode( ‘ + ’ , E1.nptr, T.nptr) – Example 5.8, p. 289 Sanath Jayasena/Apr 2006 7-32
DAGs for Expressions
• A dag for an expression identifies common subexpressions – Unlike a syntax tree, a node for a common subexpression may have > 1 parent node – E.g., “ a + a * (b-c) + (b-c) * d ” • Fig. 5.11, p.291
• How to create a dag, given an expression?
– Check if an identical node already exists 7-33
Review
• Example: for the assignment statement, a = b * -c + b * -c, give a syntax tree, dag and postfix notation • Fig. 8.2, p. 464 Sanath Jayasena/Apr 2006 7-34
Three-Address Code (3AC)
• 3AC is a sequence of statements of the general form x := y
Examples
• Given the expression, x+y*z the 3AC t1 := y * z t2 := x + t1 • Show 3AC for (a) syntax tree, (b) dag discussed earlier in slide 7-34 (Fig. 8.2) – Fig. 8.5, p. 466 Sanath Jayasena/Apr 2006 7-36
3AC
… contd
• A name in a program replaced by a pointer to a symbol table entry for that name • 3AC statements are like assembly code – There are flow-control statements – They can have symbolic labels – A label represents the index of a 3AC statement in an array containing the intermediate code Sanath Jayasena/Apr 2006 7-37
Types of 3AC Statements
1. Assignment statements with binary operators (arithmetic or logical) – Of the form x:= y
Types of 3AC Statements
… contd
4. Unconditional jump: goto L – Statement with label L to be executed next 5. Conditional jump: if x
Types of 3AC Statements
… contd
6. Function calls: param x , and return y call p, n –
“
return y ” is optional – E.g., for call p(x1, x2, … , xn) will be param x1 param x2 the 3AC … param xn call p, n Sanath Jayasena/Apr 2006 7-40
Types of 3AC Statements
… contd
7. Indexed assignments: x := y[i] , x[i] := y – In x:=y[i] location i units beyond memory location y : x is set to the value in – In x[i]:=y beyond memory location x is set to the value of y : value in location i units – x, y and i are data objects 7-41 Sanath Jayasena/Apr 2006
Types of 3AC Statements
… contd
8. Address & pointer assignments: x := &y , x := *y , *x := y – In x:= &y of y : x is set to be the location • y denotes an
l
-value, x is a pointer name – In x:= *y : (
r
-value of) x is set to the value in location pointed by y • y is a pointer;
r
-value of y is a location – In *x:= y : (
r
-value of) object pointed by x is set to (the
r
-value of) y Sanath Jayasena/Apr 2006 7-42
Syntax-Dir. Translation into 3AC
• When 3AC code is generated, temp names are made up for interior nodes in syntax tree – E.g., for E E1 + E2, value of E on LHS will be computed to a new temp t • Example – Fig. 8.6, Fig 8.7 on p. 469 Sanath Jayasena/Apr 2006 7-43
Implementation of 3AC
• 3AC is an abstract form – Can be implemented in a compiler as records – (with fields for operator and operands) • Three representations – Quadruples – Triples – Indirect triples Sanath Jayasena/Apr 2006 7-44
(a) Quadruples
• A record structure with 4 fields – op , arg1 , arg2 and
result
• Examples – For x := y op z • y in arg1, z we have: in arg2 and x in result – For unary operators, arg2 not used – For param operator, arg2 and result unused – Fig. 8.8(a), p. 471 for a:= b* -c + b* -c • Content of fields are pointers to ST Sanath Jayasena/Apr 2006 entries 7-45
(b) Triples
• Temps generated in quadruples must be entered in symbol table • To avoid this, we can refer to a temp value by the location of the relevant statement – We can have records with only 3 fields • op , arg1 and arg2 – Fields arg1 and arg2 can be pointers to ST entries or to triple structure for temp values 7-46
(c) Indirect Triples
• Listing of pointers to triples, rather than triples themselves • Example – We can use an array to list pointers to triples in the desired order – Example: Fig 8.10 on p. 472 Sanath Jayasena/Apr 2006 7-47
Translating Language Constructs
• Balance of Chapter 8 in Dragon book covers details on implementing: – Declarations, scope – Assignments, array elements, fields in records – Boolean expressions – Case statements – Label renaming (called backpatching) – Function calls Sanath Jayasena/Apr 2006 7-48