No Slide Title

Download Report

Transcript No Slide Title

More LR Parsing and Bison
CPSC 388
Ellen Walker
Hiram College
More than SLR(1)
• SLR(k) Parsing
– Multiple-token lookahead (for shifts) and
multiple-token follow information (for
reductons)
• General LR(1) parsing
– Include lookaheads in DFA construction
• LALR(1) parsing
– Simplified state diagram for GLR(1)
– What YACC / Bison uses
LALR: LR(0) + Lookahead
• NFA states are [ LR(0) item , lookahead]
– Examples: [S->.(S), $] , [S->.a,)]
– After comma is first token after the RHS
• Building DFA
– [S->.(S),$] --(--> [S->(.S),$] same as SLR
– [S->(.S),$] --e--> [S->.(S),)] propagate LA
• Rule for every S-rule, every first of what follows
S in original rule
YACC / Bison
• “Yet another Compiler-Compiler”
• Given CFG, automatically creates LALR
table
• Using bison:
– Input file: grammar.y
– Output file: grammar.tab.c
– Generic main reads lines, executes rules
Structure of a Bison File
Definitions, including direct code in %{ %}
%%
Rules of the grammar, with actions
%%
Additional code, e.g.
main(){ return yyparse() }
Example: Expression Calculator
• Rules describe the usual grammar
– S’ -> exp
– exp -> exp + term | exp - term | term
– term -> term * factor | factor
– factor -> NUMBER | ( exp )
• Associated actions execute the
arithmetic
Bison Rule Syntax
•
•
•
•
LHS followed by :
Each alternative followed by action, then |
; after the last action
Example
factor :
|
;
NUMBER {$$ = $1;}
'(' exp ')' {$$ = $2;}
Bison Actions
• Rules include actions in { } (code)
• Predefined variables:
– $$ value of result of rule (YYSTYPE or int)
– $1 value of first token, $2 value of second
token, etc.
• Example
– Exp: exp ‘+’ term {$$ = $1 + $3;}
Bison and Flex together
• Define tokens in definition section:
– %token ID <val>
– Choose values > 256
• Make sure lex.yy.c and yy.tab.c agree
on token ID defs
#define ID val
• Compile both together
– g++ -o myparser yy.tab.c lex.yy.c -lfl
Flex for Bison
• Each rule should return a token type
– E.g. return NUMBER;
• In addition, a token value can be saved
in the global variable yylval
– E.g. yylval = myAtoI(yytext);
Mixing Characters and Tokens
• Don’t assign token values < 256
• Allow characters to be their own tokens
(rule):
.
return(yytext[0]);
• Or be specific:
[-+*()] return(yytext[0]);
A Few Gotcha’s
• Bison (and flex) like tabs, not spaces
• Beware of commenting out closing }
with //
• C (++) requires functions to be defined
before they are used
– Copy signatures to top of file
– “extern” for functions and variables defined
in other files
Bison Individual Homework
• Use Bison to parse and interpret simple
LISP-like commands
– ( cons a (cons b nil)) => (a b)
– ( cons (cons a nil) (cons b nil) => ((a) b)
– (car (cons a (cons b nil))) => a
– (cdr (cons a (cons b nil))) => (b)
– (cdr (cdr (cons a (cons b nil)))) => nil
• See handout for details
Error Handling in BU parsing
• Error = blank entry in parsing table
• To give specific error messages
– Many error entries (but bigger table!)
– Detect error before reducing when possible
• LR(1) is better than SLR(1) here
Recovery
• Panic mode:
– Pop states from the stack until the parse
can be restarted
– Advance input until a legal transition is
available
• Error productions
– Treat “error” as a pseudotoken
– Rules indicate how much to throw away
Error Example
• command : exp {cout << $1 << endl;}
| error {yyerror “bad cmd”;}
• Once a command is in error, parser will
– Perform the error action
– Delete tokens until a legal follow of
command ($ here)