Flex – Lexical Analyzer Generator

Download Report

Transcript Flex – Lexical Analyzer Generator

Flex
1
Flex
A Lexical Analyzer Generator


generates a scanner procedure directly, with
regular expressions and user-written procedures
Steps to using flex

1.
2.
3.
Create a description or rules file for flex to
operate on
Run flex on the input file. flex produces a C file
called lex.yy.c with the scanning function yylex().
Run the C compiler on the C file to produce a
lexical analyzer
2
Flex Files and Procedure
Scanner in c code
Rule file
*.l
Flex compiler
lex.yy.c
lex.yy.c
C compiler
-lfl
scanner.exe
Test file
scanner.exe
tokens
3
Flex Programs
The flex input file consists of three sections
separated by a line with just %%
%{
auxiliary declarations
%}
regular definitions
%%
translation rules
%%
auxiliary procedures
4
Regular Expression Definitions
Section



The definitions section contains
declarations of simple name definitions
to simplify the scanner specification.
Name definitions have the form:
name definition
Example:
DIGIT
[0-9]
ID
[a-z][a-z0-9]*
5
Translation Rules Section
P1
P2
action1
action2
...
Pn
actionn
where Pi are regular expressions and
actioni are C program segments
6
Auxiliary Procedure Section


is simply copied to lex.yy.c.
this section is optional;


if it is missing, the second %% in the input file
may be skipped.
In the definitions and rules sections, any
indented text or text


enclosed in %{ and %}
is copied to the output (with the %{}'s removed).
7
Rules

Look for the longest token


Look for the first-listed pattern that
matches the longest token


number
keywords and identifiers
List frequently occurring patterns first

white space
8
Rules

View keywords as exceptions to the rule
of identifiers


construct a keyword table
Lookahead operator: r1/r2 - match a string
in r1 only if followed by a string in r2

DO 5 I = 1. 25
DO 5 I = 1, 25
DO/({letter}|{digit})* = ({letter}|{digit})*,
9
Functions and Variables

yylex()
 a function implementing the lexical analyzer and returning
the token matched

yytext
 a global pointer variable pointing to the lexeme matched

yyleng
 a global variable giving the length of the lexeme matched

yylval
 an external global variable storing the attribute of the token
10
Example
%{
#define EOF
0
#define LE
25
...
%}
delim
[ \t\n]
ws
{delim}+
letter
[A-Za-z]
digit
[0-9]
id
{letter}({letter}|{digit})*
number
{digit}+(\.{digit}+)?(E[+\-]?{digit}+)?
%%
11
Example
{ws}
{ /* no action and no return */ }
if
{return (IF);}
else
{return (ELSE);}
{id}
{yylval=install_id(); return (ID);}
{number}
{yylval=install_num(); return (NUMBER);}
“<=”
{yylval=LE; return (RELOP);}
“==”
{yylval=EQ; return (RELOP);}
...
<<EOF>>
{return(EOF);}
%%
install_id() { ... }
install_num() { ... }
12
Lexical Error Recovery


Error: none of patterns matches a prefix
of the remaining input
Panic mode error recovery


delete successive characters from the remaining
input until the pattern-matching can continue
Error repair:




delete an extraneous character
insert a missing character
replace an incorrect character
transpose two adjacent characters
13
Maintaining Line Number

Flex allows to maintain the number of the
current line in the global variable yylineno
using the following option mechanism
%option yylineno
in the first section
14
Flex : Regular Expression
x
.
[xyz]
match the character 'x'
any character (byte) except newline
a "character class"; in this case, the pattern
matches either an 'x', a 'y', or a 'z'
[abj-oZ] a "character class" with a range in it; matches
an 'a', a 'b', any letter from 'j' through 'o',
or a 'Z'
[^A-Z]
a "negated character class", i.e., any character
but those in the class. In this case, any
character EXCEPT an uppercase letter.
[^A-Z\n] any character EXCEPT an uppercase letter or
a newline
15
Flex : Regular Expression
r*
r+
r?
r{2,5}
r{2,}
r{4}
{name}
zero or more r's, where r is any regular expression
one or more r's
zero or one r's (that is, "an optional r")
anywhere from two to five r's
two or more r's
exactly 4 r's
the expansion of the "name" definition
(see above)
"[xyz]\"foo“ the literal string: [xyz]"foo
\X
if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
then the ANSI-C interpretation of \x.
Otherwise, a literal 'X' (used to escape
operators such as '*')
16
Flex : Regular Expression
\0
\123
\x2a
(r)
rs
r|s
^r
r$
a NUL character (ASCII code 0)
the character with octal value 123
the character with hexadecimal value 2a
match an r; parentheses are used to override
precedence (see below)
the regular expression r followed by the
regular expression s; called "concatenation"
either an r or an s
an r, but only at the beginning of a line (i.e.,
which just starting to scan, or right after a
newline has been scanned).
an r, but only at the end of a line (i.e., just
before a newline). Equivalent to "r/\n".
17
Execute Flex

Create a directory in cygwin



Downalod calc.l or c.l
Execute flex


Example /usr/src/compiler
Flex calc.l
Lex.yy.c

will be generated
18