Transcript ML-Yacc

ML-YACC
David Walker
COS 320
Outline
• Last Week
– Introduction to Lexing, CFGs, and Parsing
• Today:
– More parsing:
• automatic parser generation via ML-Yacc
– Reading: Chapter 3 of Appel
Parser Implementation
•
Implementation Options:
1. Write a Parser from scratch
– not as boring as writing a lexer, but not exactly a
weekend in the Bahamas
2. Use a Parser Generator
– Very general & robust. sometimes not quite as
efficient as hand-written parsers. Nevertheless,
good for lazy compiler writers.
Parser
Specification
Parser Implementation
•
Implementation Options:
1. Write a Parser from scratch
– not as boring as writing a lexer, but not exactly a
weekend in the Bahamas
2. Use a Parser Generator
– Very general & robust. sometimes not quite as
efficient as hand-written parsers. Nevertheless,
good for lazy compiler writers.
Parser
Specification
Parser
parser
generator
Parser Implementation
•
Implementation Options:
1. Write a Parser from scratch
– not as boring as writing a lexer, but not exactly a
weekend in the Bahamas
2. Use a Parser Generator
– Very general & robust. sometimes not quite as
efficient as hand-written parsers. Nevertheless,
good for lazy compiler writers.
stream of
tokens
Parser
Specification
Parser
parser
generator
abstract syntax
ML-Yacc specification
• three parts:
User Declarations: declare values available in the rule actions
%%
ML-Yacc Definitions: declare terminals and non-terminals;
special declarations to resolve conflicts
%%
Rules: parser specified by CFG rules and
associated semantic action that generate abstract syntax
ML-Yacc declarations
(preliminaries)
• specify type of positions
%pos int * int
• specify terminal and nonterminal symbols
%term IF | THEN | ELSE | PLUS | MINUS ...
%nonterm prog | exp | op
• specify end-of-parse token
%eop EOF
• specify start symbol (by default, non terminal in LHS
of first rule)
%start prog
Simple
ML-Yacc
Example
grammar
symbols
%%
%term NUM | PLUS | MUL | LPAR | RPAR
%nonterm exp | fact | base
grammar rules
%pos int
%start exp
%eop EOF
semantic
actions
(currently
do nothing)
%%
exp : fact
| fact PLUS exp
()
()
fact : base
| base MUL factor
()
()
base : NUM
| LPAR exp RPAR
()
()
attribute-grammars
• ML-Yacc uses an attribute-grammar scheme
– each nonterminal may have a semantic value
associated with it
– when the parser reduces with (X ::= s)
• a semantic action will be executed
• uses semantic values from symbols in s
– when parsing is completed successfully
• parser returns semantic value associated with the
start symbol
• usually a parse tree
attribute-grammars
• semantic actions typically build the abstract syntax for the
internal language
• to use semantic values during parsing, we must declare
symbol types:
– %terminal NUM of int | PLUS | MUL | ...
– %nonterminal exp of int | fact of int | base of int
• type of semantic action must match type declared for LHS
nonterminal in rule
ML-Yacc
with
Semantic
Actions
grammar
symbols
with
type
declarations
grammar rules
with
semantic
actions
%%
%term NUM of int | PLUS | MUL | LPAR | RPAR
%nonterm exp of int | fact of int | base of int
%pos int
%start exp
%eop EOF
computing
integer result
via semantic
actions
%%
exp : fact
| fact PLUS exp
(fact)
(fact + exp)
fact : base
| base MUL base
(base)
(base1 * base2)
base : NUM
| LPAR exp RPAR
(NUM)
(exp)
ML-Yacc with Semantic Actions
datatype exp =
Int of int | Add of exp * exp | Mul of exp * exp
%%
...
%%
exp : fact
| fact PLUS exp
(fact)
(Add (fact, exp))
fact : base
| base MUL exp
(base)
(Mul (base, exp))
base : NUM
| LPAR exp RPAR
(Int NUM)
(exp)
computing
abstract syntax
via semantic
actions
A simpler grammar
datatype exp =
Int of int | Add of exp * exp | Mul of exp * exp
%%
...
%%
exp : NUM
| exp PLUS exp
| exp MUL exp
| LPAR exp RPAR
why don’t we just use
this simpler grammar?
(Int NUM)
(Add (exp1, exp2))
(Mul (exp1, exp2))
(exp)
A simpler grammar
datatype exp =
Int of int | Add of exp * exp | Mul of exp * exp
%%
...
%%
exp : NUM
| exp PLUS exp
| exp MUL exp
| LPAR exp RPAR
this grammar is
ambiguous!
(Int NUM)
(Add (exp1, exp2))
(Mul (exp1, exp2))
(exp)
E
E
NUM + NUM * NUM
NUM
+
E
E *
NUM
E
E
E
NUM
E +
NUM
*
E
E
NUM
NUM
a simpler grammar
datatype exp =
Int of int | Add of exp * exp | Mul of exp * exp
But it is so clean
that it would be nice
to use. Moreover, we
know which parse
tree we want. We
just need a mechanism
to specify it!
%%
...
%%
exp : NUM
| exp PLUS exp
| exp MUL exp
| LPAR exp RPAR
(Int NUM)
(Add (exp1, exp2))
(Mul (exp1, exp2))
(exp)
E
E
NUM + NUM * NUM
NUM
+
E
E *
NUM
E
E
E
NUM
E +
NUM
*
E
E
NUM
NUM
Recall how LR parsing works:
desired parse tree:
exp ::= NUM
| exp PLUS exp
| exp MUL exp
| LPAR exp RPAR
E
E
NUM
+
E
E *
E
yet to read
Input from lexer: NUM + NUM * NUM
State of parse so far:
NUM
NUM
E+E
We have a shift-reduce conflict.
What should we do to get the right parse?
elements of
desired parse
parsed so far
Recall how LR parsing works:
desired parse tree:
exp ::= NUM
| exp PLUS exp
| exp MUL exp
| LPAR exp RPAR
E
E
NUM
+
E
E *
E
yet to read
Input from lexer: NUM + NUM * NUM
State of parse so far:
NUM
NUM
E+E*
We have a shift-reduce conflict.
What should we do to get the right parse?
SHIFT
elements of
desired parse
parsed so far
Recall how LR parsing works:
desired parse tree:
exp ::= NUM
| exp PLUS exp
| exp MUL exp
| LPAR exp RPAR
E
E
NUM
+
E
E *
E
yet to read
Input from lexer: NUM + NUM * NUM
State of parse so far:
NUM
NUM
E + E * NUM
elements of
desired parse
parsed so far
SHIFT SHIFT
Recall how LR parsing works:
desired parse tree:
exp ::= NUM
| exp PLUS exp
| exp MUL exp
| LPAR exp RPAR
E
E
NUM
+
E
E *
E
yet to read
Input from lexer: NUM + NUM * NUM
State of parse so far:
NUM
NUM
E+E*E
elements of
desired parse
parsed so far
REDUCE
Recall how LR parsing works:
desired parse tree:
exp ::= NUM
| exp PLUS exp
| exp MUL exp
| LPAR exp RPAR
E
E
NUM
+
E
E *
E
yet to read
Input from lexer: NUM + NUM * NUM
State of parse so far:
NUM
NUM
E+E
elements of
desired parse
parsed so far
REDUCE
Recall how LR parsing works:
desired parse tree:
exp ::= NUM
| exp PLUS exp
| exp MUL exp
| LPAR exp RPAR
E
E
NUM
+
E
E *
E
yet to read
Input from lexer: NUM + NUM * NUM
State of parse so far:
NUM
NUM
E
elements of
desired parse
parsed so far
REDUCE
The alternative parse
exp ::= NUM
| exp PLUS exp
| exp MUL exp
| LPAR exp RPAR
E +
yet to read
NUM
E
NUM
Input from lexer: NUM + NUM * NUM
State of parse so far:
E+E
We have a shift-reduce conflict.
Suppose we REDUCE next
elements
parsed so far
The alternative parse
exp ::= NUM
| exp PLUS exp
| exp MUL exp
| LPAR exp RPAR
E
E +
yet to read
NUM
E
NUM
Input from lexer: NUM + NUM * NUM
State of parse so far:
REDUCE
E
elements
parsed so far
The alternative parse
exp ::= NUM
| exp PLUS exp
| exp MUL exp
| LPAR exp RPAR
E
E +
yet to read
NUM
*
E
NUM
Input from lexer: NUM + NUM * NUM
State of parse so far:
E*E
Now: SHIFT SHIFT REDUCE
E
elements
parsed so far
NUM
The alternative parse
E
exp ::= NUM
| exp PLUS exp
| exp MUL exp
| LPAR exp RPAR
E
E +
yet to read
NUM
*
E
NUM
Input from lexer: NUM + NUM * NUM
State of parse so far:
REDUCE
E
E
elements
parsed so far
NUM
Summary
desired parse tree:
exp ::= NUM
| exp PLUS exp
| exp MUL exp
| LPAR exp RPAR
E
E
NUM
+
E
E *
E
yet to read
Input from lexer: NUM + NUM * NUM
State of parse so far:
NUM
NUM
E+E
elements of
desired parse
parsed so far
We have a shift-reduce conflict.
We have E + E on stack, we see *.
We want to shift. We ALWAYS want to
shift since * has higher precedence than +
==> symbols to the right on the stack get processed first
Example 2
exp ::= NUM
| exp PLUS exp
| exp MUL exp
| exp MINUS exp
| LPAR exp RPAR
E yet to read
NUM
E
NUM
Input from lexer: NUM - NUM - NUM
State of parse so far:
E-E
elements
parsed so far
We have a shift-reduce conflict.
We have E - E on stack, we see -.
We want “-” to be a left-associative operator.
ie: NUM – NUM – NUM == ((NUM – NUM) – NUM)
What do we do?
Example 2
exp ::= NUM
| exp PLUS exp
| exp MUL exp
| exp MINUS exp
| LPAR exp RPAR
E
E yet to read
NUM
E
NUM
Input from lexer: NUM - NUM - NUM
State of parse so far:
E
We have a shift-reduce conflict.
We have E - E on stack, we see -.
What do we do?
REDUCE
elements
parsed so far
Example 2
exp ::= NUM
| exp PLUS exp
| exp MUL exp
| exp MINUS exp
| LPAR exp RPAR
E
E yet to read
NUM
E
NUM
Input from lexer: NUM - NUM - NUM
State of parse so far:
E-E
SHIFT SHIFT REDUCE
E
elements
parsed so far
NUM
Example 2
E
exp ::= NUM
| exp PLUS exp
| exp MUL exp
| exp MINUS exp
| LPAR exp RPAR
E
E yet to read
NUM
E
NUM
Input from lexer: NUM - NUM - NUM
State of parse so far:
REDUCE
E
E
elements
parsed so far
NUM
Example 2: Summary
E
exp ::= NUM
| exp PLUS exp
| exp MUL exp
| exp MINUS exp
| LPAR exp RPAR
E
E yet to read
NUM
E
NUM
Input from lexer: NUM - NUM - NUM
State of parse so far:
E
We have a shift-reduce conflict.
We have E - E on stack, we see -.
What do we do? REDUCE. We ALWAYS
want to reduce since – is left-associative.
E
elements
parsed so far
NUM
precedence and associativity
• three solutions to dealing with operator
precedence and associativity:
1) let Yacc complain.
• its default choice is to shift when it encounters a shift-reduce
error
• BAD: programmer intentions unclear; harder to debug other
parts of your grammar; generally inelegant
2) rewrite the grammar to eliminate ambiguity
• can be complicated and less clear
3) use Yacc precedence directives
• %left, %right %nonassoc
precedence and associativity
• given directives, ML-Yacc assigns precedence to each
terminal and rule
– precedence of terminal based on order in which associativity is
specified
– precedence of rule is the precedence of the right-most terminal
• eg: precedence of (E ::= E + E) == prec(+)
• a shift-reduce conflict is resolved as follows
– prec(terminal) > prec(rule) ==> shift
– prec(terminal) < prec(rule) ==> reduce
– prec(terminal) = prec(rule) ==>
• assoc(terminal) = left ==> reduce
• assoc(terminal) = right ==> shift
• assoc(terminal) = nonassoc ==> report as error
yet to read
input: terminal T next: ....................T E
RHS of rule on stack: ........E % E
precedence and associativity
datatype exp =
Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp
%%
%left PLUS MINUS
%left MUL DIV
%%
exp : NUM
| exp PLUS exp
| exp MINUS exp
| exp MUL exp
| exp DIV exp
| LPAR exp RPAR
(Int NUM)
(Add (exp1, exp2))
(Sub (exp1, exp2))
(Mul (exp1, exp2))
(Div (exp1, exp2))
(exp)
precedence and associativity
precedence directives:
%left PLUS MINUS
%left MUL DIV
yet to read
input: terminal T next: ....................MUL E
RHS of rule on stack: ...E PLUS E
prec(MUL) > prec(PLUS)
precedence and associativity
precedence directives:
%left PLUS MINUS
%left MUL DIV
yet to read
input: terminal T next: ....................MUL E
RHS of rule on stack: ... E PLUS E
SHIFT
prec(MUL) > prec(PLUS)
precedence and associativity
precedence directives:
%left PLUS MINUS
%left MUL DIV
yet to read
input: terminal T next: ....................SUB E
RHS of rule on stack: ...E PLUS E
prec(PLUS) = prec(SUB)
precedence and associativity
precedence directives:
%left PLUS MINUS
%left MUL DIV
yet to read
input: terminal T next: ....................SUB E
RHS of rule on stack: ...E PLUS E
REDUCE
prec(PLUS) = prec(SUB)
one more example
datatype exp =
Int of int
| Add of exp * exp | Sub of exp * exp
| Mul of exp * exp | Div of exp *exp
| Uminus of exp
....................MUL E
%%
...MINUS E
%left PLUS MINUS
%left MUL DIV
what happens?
%%
exp : NUM
| MINUS exp
| exp PLUS exp
| exp MINUS exp
| exp MUL exp
| exp DIV exp
| LPAR exp RPAR
(Int NUM)
(Uminus exp)
(Add (exp1, exp2))
(Sub (exp1, exp2))
(Mul (exp1, exp2))
(Div (exp1, exp2))
(exp)
yet to read
one more example
datatype exp =
Int of int
| Add of exp * exp | Sub of exp * exp
| Mul of exp * exp | Div of exp *exp
| Uminus of exp
....................MUL E
%%
...MINUS E
%left PLUS MINUS
%left MUL DIV
what happens?
%%
prec(*) > prec(-) ==> we SHIFT
exp : NUM
| MINUS exp
| exp PLUS exp
| exp MINUS exp
| exp MUL exp
| exp DIV exp
| LPAR exp RPAR
(Int NUM)
(Uminus exp)
(Add (exp1, exp2))
(Sub (exp1, exp2))
(Mul (exp1, exp2))
(Div (exp1, exp2))
(exp)
yet to read
the fix
datatype exp =
Int of int
| Add of exp * exp | Sub of exp * exp
| Mul of exp * exp | Div of exp *exp
| Uminus of exp
%%
%left PLUS MINUS
%left MUL DIV
%left UMINUS
%%
exp : NUM
(Int NUM)
| MINUS exp %prec UMINUS (Uminus exp)
| exp PLUS exp
(Add (exp1, exp2))
| exp MINUS exp
(Sub (exp1, exp2))
| exp MUL exp
(Mul (exp1, exp2))
| exp DIV exp
(Div (exp1, exp2))
| LPAR exp RPAR
(exp)
yet to read
....................MUL E
...MINUS E
the fix
datatype exp =
Int of int
| Add of exp * exp | Sub of exp * exp
| Mul of exp * exp | Div of exp *exp
| Uminus of exp
%%
%left PLUS MINUS
%left MUL DIV
%left UMINUS
%%
exp : NUM
(Int NUM)
| MINUS exp %prec UMINUS (Uminus exp)
| exp PLUS exp
(Add (exp1, exp2))
| exp MINUS exp
(Sub (exp1, exp2))
| exp MUL exp
(Mul (exp1, exp2))
| exp DIV exp
(Div (exp1, exp2))
| LPAR exp RPAR
(exp)
yet to read
....................MUL E
...E MINUS E
changing precedence of rule
alters decision:
prec(UMINUS) > prec(MUL) ==>
we REDUCE
the dangling else problem
• Grammar:
S ::= if E then S else S
| if E then S
| ...
• Consider: if a then if b then S else S
– parse 1: if a then (if b then S else S)
– parse 2: if a then (if b then S) else S
• Parser reports shift-reduce error
– in default behavior: shift (what we want)
the dangling else problem
• Grammar:
S ::= if E then S else S
| if E then S
| ...
• Alternative solution is to rewrite grammar:
S ::= M
|U
M ::= if E then M else M
| ...
U ::= if E then S
| if E then M else U
default behavior of ML-Yacc
• Shift-Reduce error
– shift
• Reduce-Reduce error
– reduce by first rule
– generally considered unacceptable
• for assignment 3, your job is to write a
grammar for Fun such that there are no
conflicts
– you may use precedence directives tastefully
Note: To enter ML-Yacc hell,
use a parser to catch type errors
• when doing assignment 3, your job is to catch
parse errors
• there are lots of programming errors that will slip
by the parser:
– eg: 3 + true
– catching these sorts of errors is the job of the type
checker
– just as catching program structure errors was the job
of the parser, not the lexer
– attempting to do type checking in the parser is
impossible (in general)
• why? Hint: what does “context-free grammar” imply?