more_yacc.pptx

Download Report

Transcript more_yacc.pptx

More yacc
What is yacc
– Tool to produce a parser given a grammar
– YACC (Yet Another Compiler Compiler) is a
program designed to compile a LALR(1)
grammar and to produce the source code of
the syntactic analyzer of the language
produced by this grammar
– Input is a grammar (rules) and actions to
take upon recognizing a rule
– Output is a C program and optionally a
header file of tokens
Works with lex
• Lex is a scanner generator
• Input is description of patterns and actions
• Output is a C program which contains a function
yylex() which, when called, matches patterns
and performs actions per input
• Typically, the generated scanner performs lexical
analysis and produces tokens for the (YACCgenerated) parser
Structure of a YACC File
• Has the same three-part structure as Lex
• Each part is separated by a %% symbol
• The three parts are even identical:
– definition section
– rules section
– code section (copied directly into the
generated program)
Definition Section
• Declare tokens used in the grammar and types
of values used on the stack here
• Tokens that are single quoted characters like
“=“ or “+” need not be declared.
• Literal C code can be included in a block in this
section using %{…%}
Declaring Tokens
• The tokens that are used in the grammar must
be declared
• Include lines like the one below in the
definition section:
%token CHARSTRING INT IDENTIFIER
%token LPAREN RPAREN
The Rules Section
• The rules of the grammar are placed here.
• Here is an example of the basic syntax:
Expr  INTEGER + INTEGER | INTEGER - INTEGER
expr : INTEGER + INTEGER {action}
| INTEGER – INTEGER {action}
;
YACC grammar definition
YACC Actions
• Simiar to Lex, actions can be defined that will be
performed whenever a production is applied in
the stream of tokens.
• These are usually included after the production
whose action is to be defined.
• Since every symbol in the grammar has a
corresponding value, it will be necessary to
access those values.
• Accessing the YACC stack will be the way to do
this.
Accessing the Stack
• Since YACC generates an LR parser, it will
push the symbols that it reads along with
their values on a stack until it is ready to
reduce.
• To access these values, include a dollar
sign with a number to get at each value in
the production in the action definition.
Accessing the Stack
Refers to the value of the left nonterminal
expr : INTEGER + INTEGER {$$ = $1 + $3}
| INTEGER – INTEGER {$$ = $1 - $3}
;
Tokens and values come from lex
YACC
yyparse
LEX
yylex
Revisiting Lex
• The Lex file will have to be modified to
work with the YACC parser in two main
places.
• In the definition section, include this
statement: #include “y.tab.h”
• That is a header file automatically created
by YACC when the parser is generated.
• The actions for the rules need to be
changed too.
Revisiting Lex Actions
• For tokens with a value, assign that value
to yylval. YACC can read the value from
that variable.
• Include a return statement for the token
name (this is the same name that is
defined at the top of the YACC file).
if
[1-9][0-9]*
{return IF;}
{yylval = atoi(yytext); return INTEGER;}
The %union Declaration
• Different tokens have different data types.
• INTEGER are integers, FLOAT are floats,
CHARACTERSTRING are char *,
IDENTIFIER are pointers to the entry in
the symbol table for that identifier.
• The %union will allow the parser to apply
the right data type to the right token.
The %union Declaration
YACC Definition Section
%union {
int intValue;
float floatValue;
}
%token <intValue> INTEGER
%token <floatValue> FLOAT
Lex Rules Section
… {yylval.intValue = atoi(yytext); return INTEGER;}
… {yylval.floatValue = atof(yytext); return FLOAT;}