Flex – Lexical Analyzer Generator
Download
Report
Transcript Flex – Lexical Analyzer Generator
Flex
1
Flex
A Lexical Analyzer Generator
generates a scanner procedure directly, with
regular expressions and user-written procedures
Steps to using flex
1.
2.
3.
Create a description or rules file for flex to
operate on
Run flex on the input file. flex produces a C file
called lex.yy.c with the scanning function yylex().
Run the C compiler on the C file to produce a
lexical analyzer
2
Flex Files and Procedure
Scanner in c code
Rule file
*.l
Flex compiler
lex.yy.c
lex.yy.c
C compiler
-lfl
scanner.exe
Test file
scanner.exe
tokens
3
Flex Programs
The flex input file consists of three sections
separated by a line with just %%
%{
auxiliary declarations
%}
regular definitions
%%
translation rules
%%
auxiliary procedures
4
Regular Expression Definitions
Section
The definitions section contains
declarations of simple name definitions
to simplify the scanner specification.
Name definitions have the form:
name definition
Example:
DIGIT
[0-9]
ID
[a-z][a-z0-9]*
5
Translation Rules Section
P1
P2
action1
action2
...
Pn
actionn
where Pi are regular expressions and
actioni are C program segments
6
Auxiliary Procedure Section
is simply copied to lex.yy.c.
this section is optional;
if it is missing, the second %% in the input file
may be skipped.
In the definitions and rules sections, any
indented text or text
enclosed in %{ and %}
is copied to the output (with the %{}'s removed).
7
Rules
Look for the longest token
Look for the first-listed pattern that
matches the longest token
number
keywords and identifiers
List frequently occurring patterns first
white space
8
Rules
View keywords as exceptions to the rule
of identifiers
construct a keyword table
Lookahead operator: r1/r2 - match a string
in r1 only if followed by a string in r2
DO 5 I = 1. 25
DO 5 I = 1, 25
DO/({letter}|{digit})* = ({letter}|{digit})*,
9
Functions and Variables
yylex()
a function implementing the lexical analyzer and returning
the token matched
yytext
a global pointer variable pointing to the lexeme matched
yyleng
a global variable giving the length of the lexeme matched
yylval
an external global variable storing the attribute of the token
10
Example
%{
#define EOF
0
#define LE
25
...
%}
delim
[ \t\n]
ws
{delim}+
letter
[A-Za-z]
digit
[0-9]
id
{letter}({letter}|{digit})*
number
{digit}+(\.{digit}+)?(E[+\-]?{digit}+)?
%%
11
Example
{ws}
{ /* no action and no return */ }
if
{return (IF);}
else
{return (ELSE);}
{id}
{yylval=install_id(); return (ID);}
{number}
{yylval=install_num(); return (NUMBER);}
“<=”
{yylval=LE; return (RELOP);}
“==”
{yylval=EQ; return (RELOP);}
...
<<EOF>>
{return(EOF);}
%%
install_id() { ... }
install_num() { ... }
12
Lexical Error Recovery
Error: none of patterns matches a prefix
of the remaining input
Panic mode error recovery
delete successive characters from the remaining
input until the pattern-matching can continue
Error repair:
delete an extraneous character
insert a missing character
replace an incorrect character
transpose two adjacent characters
13
Maintaining Line Number
Flex allows to maintain the number of the
current line in the global variable yylineno
using the following option mechanism
%option yylineno
in the first section
14
Flex : Regular Expression
x
.
[xyz]
match the character 'x'
any character (byte) except newline
a "character class"; in this case, the pattern
matches either an 'x', a 'y', or a 'z'
[abj-oZ] a "character class" with a range in it; matches
an 'a', a 'b', any letter from 'j' through 'o',
or a 'Z'
[^A-Z]
a "negated character class", i.e., any character
but those in the class. In this case, any
character EXCEPT an uppercase letter.
[^A-Z\n] any character EXCEPT an uppercase letter or
a newline
15
Flex : Regular Expression
r*
r+
r?
r{2,5}
r{2,}
r{4}
{name}
zero or more r's, where r is any regular expression
one or more r's
zero or one r's (that is, "an optional r")
anywhere from two to five r's
two or more r's
exactly 4 r's
the expansion of the "name" definition
(see above)
"[xyz]\"foo“ the literal string: [xyz]"foo
\X
if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
then the ANSI-C interpretation of \x.
Otherwise, a literal 'X' (used to escape
operators such as '*')
16
Flex : Regular Expression
\0
\123
\x2a
(r)
rs
r|s
^r
r$
a NUL character (ASCII code 0)
the character with octal value 123
the character with hexadecimal value 2a
match an r; parentheses are used to override
precedence (see below)
the regular expression r followed by the
regular expression s; called "concatenation"
either an r or an s
an r, but only at the beginning of a line (i.e.,
which just starting to scan, or right after a
newline has been scanned).
an r, but only at the end of a line (i.e., just
before a newline). Equivalent to "r/\n".
17
Execute Flex
Create a directory in cygwin
Downalod calc.l or c.l
Execute flex
Example /usr/src/compiler
Flex calc.l
Lex.yy.c
will be generated
18