Abstract Syntax Mooly Sagiv Schrierber 317 03-640-7606 Wed 10:00-12:00 html://www.math.tau.ac.il/~msagiv/courses/wcc.html Outline • The general idea • Bison • Motivating example Interpreter for arithmetic expressions • The need for abstract syntax •

Download Report

Transcript Abstract Syntax Mooly Sagiv Schrierber 317 03-640-7606 Wed 10:00-12:00 html://www.math.tau.ac.il/~msagiv/courses/wcc.html Outline • The general idea • Bison • Motivating example Interpreter for arithmetic expressions • The need for abstract syntax •

Abstract Syntax
Mooly Sagiv
Schrierber 317
03-640-7606
Wed 10:00-12:00
html://www.math.tau.ac.il/~msagiv/courses/wcc.html
Outline
• The general idea
• Bison
• Motivating example
Interpreter for arithmetic expressions
• The need for abstract syntax
• Abstract syntax for Straight-line code
• Abstract syntax for Tiger (Targil)
Semantic Analysis during
Recursive Descent Parsing
• Scanner returns “semantic values” for
some tokens
• The function of every non-terminal returns
the “corresponding subtree value”
• When A ::= B C D is applied
the function for A can use the values
returned by B, C, and D
– The function can also pass parameters, e.g., to
D(), reflecting left contexts
int E() {
swith(tok) {
case num : temp=tok.val; eat(num);
return EP(temp);
E ::= num E’
E’ ::=empty-string
E’ := + num E’
default: error(…); }}
int EP(int left) {
swith(tok) {
case $: return left;
case + : eat(+);
temp=tok.val; eat(num);
return EP(left + temp);
default: error(…) ;}}
Semantic Analysis during
Bottom-Up Parsing
• Scanner returns “semantic values” for
some tokens
• Use parser stack to store the “corresponding
subtree values”
• When A ::= B C D is reduced
the function for A can use the values
returned by B, C, and D
• No action in the middle of the rule
Example
5
E ::= E + num
num
E::= num
+
7
12
E
E
Bison Specification
Declarations
%%
Productions
%%
C -Routines
Interpreter (in Bison)
%{ declarations of yylex()
and yyeror()
%}
%union {
int num;
string id;}
%token <id> ID
%token <num> NUM
%type <num> e f t
%start e
%%
e : e ‘+’ t {$$ = $1 + $3 ; }
| e ‘-’ t { $$ = $1 - $3 ; }
|t
{ $$ = $1; }
;
t : t ‘*’ f { $$ = $1 * $3; }
| t ‘/’ f { $$= $1 / $3; }
| f { $$ = $1; }
;
f : NUM { $$ = $1; }
| ID { $$ = lookup($1); }
| ‘-’ e { $$ = - $2; }
| ‘(‘ e ‘)’ { $$ = $2; }
;
Interpreter (compact spec.)
%{ declarations of yylex()
and yyeror()
%}
%union {
e : e PLUS e {$$ = $1 + $3 ; }
int num;
| e MINUS e { $$ = $1 - $3 ; }
string id;}
| e MUL e { $$ = $1 * $3; }
%token <id> ID
| e DIV e { $$= $1 / $3; }
%token <num> NUM
| NUM
%type <num> e
| ID { $$ = lookup($1); }
%start e
| MINUS e % prec UMINUS { $$ = - $2; }
%left PLUS MINUS
| ‘(‘ e ‘)’ { $$ = $2; }
%left MUL DIV
;
%right UMINUS
%%
stack
input
7+11+17$
$
num 7
+11+17$
action
shift
reduce e ::= num
$
e7
+11+17$
shift
11+17$
shift
$
+
e7
$
num 11
+
e7
$
+17$
reduce e::= num
stack
e 11
input
+17$
action
reduce e :=: e+e
+
e7
$
e 18
+17$
shift
17$
shift
$
+
e 18
$
num 17
+
e 18
$
$
reduce e::= num
stack
e 17
input
$
action
reduce e::= e+e
+
e 18
$
e 35
$
$
accept
So why can’t we write
all the compiler code
in Bison?
%{
prog : stm
typdef struct table *Table_ ;
typedef Table_ struct {string id, int value, Table _tail}
;
Table_ Table(string id, int value, struct table *tail); stm: stm SEMICOLUMN stm
Table_ table=NULL;
| ID ASSIGN exp
int lookup(Table_ table, string id) {
{update(&table, $1, $3); }
assert(table!=NULL)
| PRINT LPAREN exps RPAREN
if (id==table.id) return table.value;
{printf(“\n”); }
else return lookup(table.tail, id)
;
}
exps : exp
void update(Table_ *tabptr, string id, int value) {
{printf(“%d”, $1) ;}
*tabptr = Table(id, value, *tabptr);
}
| exps COMMA exp
%}
{printf(“%d”, $3)
%union {int num; string id;}
;
%token <num> INT
exp : INT {$$=$1;}
%token <id> ID
| ID {$$=lookup(table, $1);}
%token ASSIGN PRINT LPAREN RPAREN
| exp PLUS exp { $$ = $1 + $3; }
%type <num> exp
| exp MINUS exp { $$= $1 - $3; }
%left SEMICOLUMN COMMA
| exp TIMES exp { $$ = $1 * $3;}
%left PLUS MINUS
| exp DIV exp { $$ = $1 / $3; }
%left TIMES DIV
%start prog
| stm COMMA exp { $$ =$3;}
%%
| ‘(‘ exp ‘)’ { $$ = $2; }
Historical Perspective
• Originally parsers were written w/o tools
• yacc, bison, ... make tools acceptable
• But it is still difficult to write compilers in
parser actions (top-down and bottom-up)
– Natural grammars are ambiguous
– No modularity principle
– Many useful programming language features
prevent code generation while parsing
• Use before declaration
• gotos
Abstract Syntax
• Intermediate program representation
• Defines a tree - Preserves program
hierarchy
• Generated by the parser
• Declared using an (ambiguous) context free
grammar (relatively flat)
Not meant for parsing
• Keywords and punctuation symbols are not
stored (Not relevant once the tree exists)
• Big programs can be also handled (possibly
via virtual memory)
Abstract Syntax for Straight-line Program
Stm ::= Stm Stm
(CompoundStm)
Stm ::= id Exp
(AssignStm)
Stm ::= ExpList
(PrintStm)
Exp ::= id
Exp ::=num
Exp ::=Exp Binop Exp
(IdExp)
(NumExp)
(OpExp)
Exp ::=Stm Exp
ExpList ::=Exp ExpList
(EseqExp)
(PairExpList)
ExpList ::=Exp
(LastExpList)
Binop ::=+
Binop ::=-
(Plus)
(Minus)
Binop ::=*
Binop ::=/
(Times)
(Div)
%{
prog : stm { $$ = $1 ;}
#include “absyn.h”
;
%}
stm: stm SEMICOLUMN stm
%union {int num; string id;
{ $$ = A_CompoundStm($1, $3) ; }
A_stm stm ;
| ID ASSIGN exp
A_exp exp ;
{$$ = A_AssignStm($1, $3); }
A_expList expList | PRINT LPAREN exps RPAREN
;}
{$$ = A_PrintStm($3); }
%token <num> INT
;
%token <id> ID
exps : exp
%token ASSIGN
{$$ = A_ExpList($1, NULL) ;}
PRINT
| exps COMMA exp
LPAREN
{$$ = A_ExpList($1, $3)
RPAREN
;
%type <num> exp
exp : INT {$$=A_NumExp($1);}
%left SEMICOLUMN
| ID {$$=A_IdExp( $1);}
COMMA
| exp PLUS exp { $$ = A_OpExp($1, A_Plus, $3); }
%left PLUS MINUS
| exp MINUS exp { $$= A_OpExp($1, A_Minus, $3);}
%left TIMES DIV
| exp TIMES exp { $$ = A_OpExp($1, A_Time, $3);}
%start prog
| exp DIV exp { $$ = A_OpExp($1, A_Div, $3);}
%%
| exp COMMA exp { $$ =A_EseqExp($1, $3);}
| ‘(‘ exp ‘)’ { $$ = $2; }
Summary
• Flex and Bison simplify the task of writing
compiler/interpreter front-ends
• Abstract syntax provides a clear interface
with other compiler phases
– Supports general programming languages
• But the design of an abstract syntax for a
given PL may take some time